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1 Introduction 


1.1 Purpose of this book 


The book is designed for students in statistics at the master level. It focuses on problem solving in the 
field of statistical inference and should be regarded as a complement to text books such as Wackerly et al 
2007, Mathematical Statistics with Applications or Casella & Berger 1990, Statistical Inference. The author 
has noticed that many students, although being well aware of the statistical ideas, fall short when being 
faced with the task of solving problems. This requires knowledge about statistical theory, but also about 
how to apply proper methodology and useful tricks. It is the aim of the book to bridge the gap between 


theoretical knowledge and problem solving. 


Each of the following chapters contains a minimum of the theory needed to solve the problems in the 
Exercises. The latter are of two types. Some exercises with solutions are interspersed in the text while 
others, called Supplementary Exercises, follow at the end of the chapter. The solutions of the latter are 
found at the end of the book. The intention is that the reader shall try to solve these problems while 
having the solutions of the preceding exercises in mind. Towards the end of the following chapters there 
is a section called ‘Final Words. Here some important aspects are considered, some of which might have 


been overlooked by the reader. 


1.2 Chapter content and plan of the book 


Emphasis will be on the kernel areas of statistical inference: Point estimation - Confidence Intervals - 
Test of hypothesis. More specialized topics such as Prediction, Sample Survey, Experimental Design, 
Analysis of Variance and Multivariate Analysis will not be considered since they require too much space 
to be accommodated here. Results in the kernel areas are based on probability theory. Therefore we 
first consider some probabilistic results, together with useful mathematics. The set-up of the following 


chapters is as follows. 


¢ Ch. 2Basicproperties of discrete and continuous (random) variables are considered and examples 
of some common probability distributions are given. Elementary pieces of mathematics are 
presented, such as rules for derivation and integration. Students who feel that their prerequisites 
are insufficient in these topics are encouraged to practice hard, while others may skip much 
of the content of this chapter. 

¢ Ch. 3 The chapter is mainly devoted to sampling distributions, i.e. the distribution of quantities 
that are computed from a sample such as sums and variances. In more complicated cases 
methods are presented for obtaining asymptotic or approximate formulas. Results from this 


chapter are essential for the understanding of results that are derived in the subsequent chapters. 
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Ch. 4 Important concepts in point estimation are introduced, such as likelihood of a sample 
and sufficient statistics. Statistics used for point estimation of unknown quantities in the 
population are called estimators. (Numerical values of the latter are called estimates.) Some 
requirements on ‘good’ estimators are mentioned, such as being unbiased, consistent and having 
small variance. Four general methods for obtaining estimators are presented: Ordinary Least 
Squares (OLS), Moment, Best Linear Unbiased Estimator (BLUE) and Maximum Likelihood 
(ML). The performance of various estimators is compared. Due to limited space other estimation 
methods have to be omitted. 

Ch. 5 The construction of confidence intervals (CIs) for unknown parameters in the population 
by means of so called pivotal statistics is explained. Guide lines are given for determining the 
sample size needed to get a CI of certain coverage probability and of certain length. It is also 
shown how CIs for functions of parameters, such as probabilities, can be constructed. 

Ch. 6 Two alternative ways of testing hypotheses are described, the p-value approach and the 
rejection region (RR) approach. When a statistic is used for testing hypotheses it is called a test 
statistic. Two general principles for constructing test statistics are presented, the Chi-square 
principle and the Likelihood Ratio principle. Each of these gives raise to a large number of 
well-known tests. It’s therefore a sign of statistical illiteracy when referring to a test as the Chi- 
Square test (probably supposed to mean the well-known test of independency between two 
qualitative variables). Furthermore, some miscellaneous methods are presented. A part of the 
chapter is devoted to nonparametric methods for testing goodness-of-fit, equality of two or 


more distributions and Fisher’s exact test for independency. 


A general expression for the power (ability of a test to discriminate between the alternatives) 
is derived for (asymptotically) normally distributed test statistics and is applied to some 


special cases. 


When several hypotheses are tested simultaneously, we increase the probability of rejecting a 
hypothesis when it in fact is true. (This is one way to ‘lie’ when using statistical inference, more 
examples are given in the book.) One solution of this problem, called the Bonferroni-Holm 


correction is presented. 
We finally give some tests for linear models, although this topic perhaps should require their 


own book. Here we consider the classical Gauss-Markov model and simple cases of models 


with random coefficients. 
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From the above one might get the impression that statistical testing is a more ‘important’ in some sense 
than point and interval estimation. This is however not the case. It has been noticed that good point 
estimators also work well for constructing good Cls and good tests. (See e.g. Stuart et al 1999, p. 276.) A 
frequent question from students is: Which is best, to make a CI or to make a test? A nice answer to this 
somewhat controversial question can be found in an article by T. Wonnacott, 1987. He argues that in 
general a Cl is to be preferred in front of a test because a CI is more informative. For the same reason he 
argues for a p-value approach in front of a RR approach. However, in practice there are situations where 
the construction of CIs becomes too complicated. Also the computation of p-values may be complicated. 
E.g. in nonparametric inference (Ch. 6.2.4) it is often much easier to make a test based on the RR approach 
than to use the p-value approach. The latter in turn being simpler than making a CI. An approach based 


on testing is also much easier to use when several parameters have to be estimated simultaneously. 


1.3 Statistical tables and facilities 


A great deal of the problem solving is devoted to computation of probabilities. For continuous variables 
this means that areas under frequency curves have to be computed. To this end various statistical tables 


are available. When using these there are two different quantities of interest. 


- Given a value on the x-axis, what is the probability of a larger value, i.e. how large is the area 
under the curve above the value on the x-axis? This may be called computation of a p-value. 
- Given a probability, ie. an area under curve, what is the value on the x-axis that produced the 


probability? This may be called computation of an inverse p-value. 


Statistical tables can show lower-tail areas or upper-tail areas. Lower-tail areas are areas below values 
on the x-axis and upper-tail areas are areas above. The reader should watch out carefully whether it is 
required to search for a p-value or an inverse p-value and whether the table show lower-or upper-tail 
areas. This seems to actually be a stumbling block for many students. It may therefore be helpful to 
remember some special cases for the normal-, Student’s T-, Chi-square- and F-distributions. (These will 
be defined in Ch. 2.2.2 and Ch. 3.1.) The following will serve as hang-ups: 


- In the normal distribution the area under curve above 1.96 is 0.025. The area under curve 
below 1.96 is thus 1-0.025=0.975. 

- In Student’ T distribution one needs to know the degrees of freedom (df) in order to determine 
the areas. With df = 1 the area under curve above 12.706 is 0.025. 

- In the Chi-square distribution with df = 1 the area under curve above 3.84~(1.96)” is 
2-0.025 = 0.05. 

- In the F distribution one needs to know a pair of degrees of freedoms sometimes denoted 
(numerator, denominator) =(f,,/,). With f,=1=f,the area under curve above 161.45 
= (12.706) is 0.025. 
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Calculation of probabilities is facilitated by using either statistical program packages, so called ‘calculators’ 


or printed statistical tables. 


Statistical program packages. These are the most reliable ones to use and both p-values and 
inverse p-values can easily be computed by using programs such as SAS or SPSS, just to 
mention a few ones. E.g. in SAS the function probt can be used to find p-values for Student’s 
T distribution and the function tinv to find inverse p-values. However, read manuals carefully. 
‘Calculators’. These have quite recently appeared on the internet. They are easy to use 
(enter a value and click on ‘calculate’) and they are often free. Especially the calculation 
of areas in the F-distribution may be facilitated. An example is found under the address 


http://vassarstats.net/tabs.html. 


Printed tables. These are often found in statistical text books. Quality can be uneven, but 
an example of an excellent table is the table over the Chi-square distribution in Wackerly 
et al, 2007. This shows both small lower-tail areas and small upper-tail areas. Many tables 


can be downloaded from the internet. One example from the University of Glasgow is 


http://www.stats.gla.ac.uk. 


Throughout this book we will compute exact probabilities obtained from functions in the program packet 


SAS. However, it is frequently enough to see whether a p-value is above or below 0.05 and in such cases 


it will suffice to use printed tables. 
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2 Basic probability 
and mathematics 


2 Probability distributions of discrete and continuous random variables 


A variable that is dependent on the outcome of an experiment (in a wide sense) is called a random 
variable (or just variable) and is denoted by an upper case letter, such as Y. A particular value taken by 
Y is denoted by a lower case letter y. For example, let Y = “Number of boys in a randomly chosen family 
with 4 children; where Y may take any of the values y = 0,...,4. Before the ‘experiment’ of choosing such 
a family we do not know the value of y. But, as will be shown below, we can calculate the probability 
that the family has y boys. The probability of the outcome “Y = y’ is denoted P(Y = y) and since it is a 
function of y it is denoted p(y). This is called the probability function (pf) of the discrete variable Y. A 
variable that can take any value in some interval, e.g. waiting time in a queue, is called continuous. The 
latter can be described by the density (frequency function) of the continuous variable Y, f(y). The latter 


shows the relative frequency of values close to y. 


Properties of p(y) (If not shown, summations are over all possible values of y.) 


1) 0< pQ) <1. >) p(y) =1 

2) Expected value, Population mean, of Y: w= E(Y)= >» y- p(y), center of gravity. 

3) Expected value of a function of Y: E(g(Y) = > sy) - p(y). 

4) (Population) Variance of Y: ao” =V(Y)= YO —p) - p(y) = E(’’)—w’, dispersion around 
population mean. The latter expression is often simpler for calculations. Notice that (3) is used 


with g(y) =(y— 4)’. 
5) Cumulative distribution function (cdf) of Y. F(yv) = P(Y < y) = p(y) + p(y —-)) +... and Survival 


function S(y) = P(Y > y) = p(y +1)+ p(v+2)+...=1-F(y). 
Properties of f(y) (If not shown, integration is over all possible values of y.) 


‘i 
1) f(y)2 0, [ Fay =LF(y)= [ fedex, 80) =I-F), 

2) P=EW)= i) y:fy)ad, center of gravity. 

3) Expected value of a function of Y, g(Y): “= E(g(Y)) = feo “I (y)ay. 


4) (Population) Variance of Y: o? =V(Y) = | (y—)? f(y)dy = E(Y?)- we? 


5) Cumulative distribution function (cdf) of Y. F(v)= PY < y)= [ fax and Survival function 
=P(Y>y)= [ Oder. = 
“4 
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Basic probability and mathematics 
6) The Population median, M, is obtained by solving the equation F(M) =1/2 for M. 
One may define a median also for a discrete variable, but this can cause problems when trying 
to obtain an unique solution. We illustrate these properties in two elementary examples. The 


mathematics needed to solve the problems is found in Section 2.2.3. 


EX 1 You throw a symmetric six-sided dice and define the discrete Y =‘Number of dots that comes up‘ The pf of Y is 
obviously p(y) =1/6, y =1,...,6. 


6 
2 E(Y)=> y- PUD= 2 yee = 2") - - 
» EO)=Sy? fae Sy" 2. one 6+1) = 
2 


91 (7) 35 
DEE t= 6 Q ~ 12 
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EX2 You arrive at a bus stop where buses run every ten minutes. Define the continuous variable Y = ‘Waiting time for 
the next bus’ The density can be assumed to be f(y) =1/10,0< y <10. 


1 1 
1 (y=) 7 =6-7=1 
) ipo ay. : 
2) BO)=[y-for=fy- “dy = 4/E] - Goaake 
10° 102], 10\ 2 


10 1 ioe te 1000-0) 100 
3 y) ae 2. = 2) = Y 
) E(Y’) fv f(y)dy [> fo”. | - ma )- 


, 10] 3 3 3 
4) 209 6 a eT ee 
3 3 
tl y M 1 
5) F(y)=|—de ==. So, F(M)=—=—>M=5., 
) FO) I 10 = 7973 


Here the median equals the mean and this is always the case when the density is symmetric around the mean. 


One may calculate probabilities such as the probability of having to wait more than 8 minutes, 
10 


1 1 1 
P(Y >8) = (—ady = — (10-8) = 
I 10 5 


More generally, a, = E(Y")is the rth moment and uw, = E(Y — yy)" the rth central moment, r = 1,2,.... 


A bivariate random variable Y consists of a pair of variables (Y,, Y,). If the latter are discrete the pf of Y 
is P1592) =P =y, OX = 2), ie. the probability of the simultaneous outcome. Given that Y, = y, 
the conditional probability of Y, is Ply,|y2)= PY, =i, ha =), ). 


Properties of p(y,,y,) (If not shown, summations are over all possible values of y, and y, ) 
1) 0< p(y .¥2) SL YY p02) =1. 


2) YS PO.¥2) = POD) > PO. ¥2) = PCr), PO) and p(y,) are marginal pfs. 


2 J 
P0152) 
P(V2) 


P1,)2) 
P()) 


1 
192 . 2a. 
pO ae PUVi>¥2) POs) PQ) 


5) Y, and Y, are independent if p(yi|y2)= p01) or plya|y.) = pa) or P(1.¥2) = PO): PO). 


3) ply,|y.)=  plya|y)= 


4) > plyiy.)= 


6) E(g(%)-A%))= >) Y e(1) 22) PO 2). 
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7) Covariance between Y, and Y, : 
Oy, = Cov(¥s¥2) = D1 D1 ~ Mi VY2 ~ Ma) PO V2) = EY) ~ Mite 


Notice that o,, = Cov(Y,,Y,) is simply the variance of Y,. 


8) Correlation between Y, and Y,: 


O12 
0, °O» 


Pr = ,whereo, =V(¥,)ando; =V(Y,) Notice that —1< p,, <1. 


9) The conditional expected value Hijo = E (y, \Y. = y2)= »y Vy Ply, | y, )is termed the regression 
yy 


function. If this is a linear function of y, the regression is linear, a+ B-y,, where @ is an 


intercept and £ is the slope or regression coefficient. 


10) The conditional variance v(y ly, = y)= Yr = Hyp)” - ply,|y2)= 


E (v2 \Y; =y, i. My is the residual variance. 


More generally, a n-dimensional random variable Yhas n components (Y,,...,Y,)and the pf is 
DO. Yn) = POY =, O.-OY, = y,,)- This can represent the outcomes in a sample of n observations. 
Assume for instance that we have chosen a sample of n families, each with 4 children. Define the variable 
Y, = ‘Number of boys in family 7, i = 1...n. In this case it may be reasonable to assume that the number 
of boys in one chosen family is independent of the number of boys in another family. The probability of 


the sample is thus 


PO WHAT, ) = PO) PO, = [eon (1a) 


If furthermore each Y, has the same pf we say that the sequence (Y, Vs is identically and independently 
distributed (iid). 


Similar relations hold for n-dimensional continuous variables. For n independent variables the joint 


density is 


Fess) =[] 76.) (1b) 
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Linear form of random variables 


Let (Y,)", be variables with E(Y,) = u,,V(Y,) =o, and Cov(Y,,Y¥;)=o;;. A linear form of the Y,'s 
is L= Yay, , where the a,are constants. It is easy to show the following (Wackerly, Mendenhall 


i=l 
&Scheaffer 2008, p. 271) 


E(L) = Yaw, 

a () 
ViL)= >) a7o,+2 ) > a,a,0, 

i=l lSi<j<n 


/ Click here 
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Consider e.g. the case n = 3 in which case y yaa: = 4\4,0}7 + 4)430)3 + 44303, We illustrate 
lsi<j<3 


the use of eq. (2) below. 


EX3 Variance of a sum and of a difference. 


VY +Y)= [a, = a, |= 01, + On) +20), V(Y, -Y,) = [a, =La,= -1]= 01, + Oy — 20 
Assume further that O,; = O05, = o say.Then p,. = a and it follows that 
V(¥,+¥,)=207 (1+ p,) andV(Y, —Y,) =207(1-:,,). 


This last equation is interesting because it shows that the variance in data with positively correlated observations can 
be reduced by forming differences. In fact V(Y, — Y,) + Oas p,, — 1.A typical example of positively correlated 
observations is in ‘before-after’ studies, e.g. when body weight is measured for each person before and after a 
slimming program. 


2.2 Some distributions 


Many discrete and continuous distributions have been found to be workable models for several important 
practical situations. Such distributions have been termed ‘families of distributions’ or ‘distributional laws. 
In this section we catalog some of these and give the basic assumptions on which they are based. We 
also give means and variances and indicate important properties and applications in following examples. 


When a certain variable Y follows a certain law L we use the notation Y~ L. 


2.2.1 Discrete distributions 


1) Y~ Bernoulli(p). Y isa variable that takes the value 1 with probability p and 0 with probability 
(1-p). The outcome Y = 1 is often termed a ‘success’ and the outcome Y = 0 is termed a ‘failure’. 
The pf is 

p(y) = Pp’ (- py”. y= 0,1 


with mean y= pand variance o* = p(l—p). 


2) Y ~ Binomial(n, p) . The pf can be derived under the following assumptions: n independent 
repetitions are made of the same experiment that each time can result in one of the outcomes 
‘success’ with probability p and ‘failure’ with probability (1-p). Define the variable Y = “Number 


of successes that occur in n trials. The pf is 


n n-y 
P(y)= pro —p)"”, y=0,L...50 
y 
with u=npando’ =np(l- p). Notice that Y = Y where (Y, Ie is a sequence of iid 
i=] 


variables, each ~ Bernoulli(p) . For the meaning of ") see Ch.2.3.5 below. 
y 
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3) 


4) 


Y ~ Geometric( p) . Assumptions: Independent repetitions are made of the same experiment 
that each time can result in one of the outcomes ‘success’ with probability p and ‘failure’ with 
probability (1-p). Define the variable Y = ‘Number of trials when a ‘success’ occurs for the 
first time’. The pf is 


p(y) =(- py)” p, y =1,2,...0 


with w =1/ pando? =(1— p)/ p*. The survival function is S(y) = P(Y > y)=(1- p)”. An 
interesting property of the Geometric distribution is the lack of memory, which means that the 
probability of a first ‘success’ in trial number (y+1), given that there has been no ‘successes’ in 


earlier trials, is the same as the probability of a ‘success’ in the first trial. Symbolically, 


PY ayrinl oy) Pe aye) Cap) p = P(Y =1) 


PY =y+llY> y)= PY > y) P(Y > y) (l- p)’ 


Y ~ Poisson(A). The pf can be derived under a variety of different assumptions. One of 
the simplest way to obtain the pf is to start with a variable that is Binomial(n,p) and to let 
n — ©, while at the same time p — Oin sucha way that n- p + J. In practice this means 
that n is large and p is so small that the product n- p = Ais moderate, say within the interval 
(0.5, 20). The pf is 


v4 
py)="_£ ,¥ =0,l,...00 
y! 
with w=Aando’ =A. 


A more general random quantity is Y(¢) . This is a counting function that describes the number 
of events that occurs during a time interval of length f. It is called a stationary Poisson process 
of rate (intensity) A and the pf is 


Py =y)- } ey =04,...00 
y! 


with £ (Y (t)) == Vy (t)). A can be interpreted as the expected number of events per unit 


time since 


t t t 


(72) ~ ~-£(Y(0) = A. Also, r| "0) = = -V(Y())= a 


A Poisson process can be obtained under the assumption that the process is a superposition 
of a large number of independent general point processes, each of low intensity (Cox & Smith 
1954, p. 91). 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 
with detailed solutions Basic probability and mathematics 


Let X(s) and Y(t) be two independent Poisson processes of rates A, and A, , respectively, e.g. 
number of road accidents during s and t hours on roads with and without limited speed. We 
are interested in comparing the two intensities in order to draw conclusions about the effect of 
limited speed on road accidents. One elegant way to do this is to use the Conditional Poisson 
Property (cf. Cox & Lewis 1968, p 223) 


at 


The conditional variable (Y(O|X(s) + Y(t) =n) ~ Binomial(n, p = ——————_ 
Ag 8 FAy*t 


The problem of comparing two intensities can thus be reduced to the problem of drawing 


inference about one single parameter. Notice that if A, =A, then p=t({s+t). 


The discrete variable Y(t) that counts the number of events in intervals of length t is related 
to another continuous variable that expresses the length between successive events. (Cf. the 
theorem (4) in Section 2.2.2.) 
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5) Y ~. (Discrete) Uniform(N). The pf is 


1 
=—,y=]1,2.,...,n 
PY) we 


with uw =(N+1)/2ando* =(N’ —1)/12. The distribution put equal mass on each of the 
outcomes 1,2,...,N. A typical example with N = 6 is when you throw a symmetric six-sided 


dice and count the number of dots coming up. 


6) (Y, ee ) ~ Multinomial(n, P)>---» Px ). This is the only example ofa discrete many-dimensional 
variable that is considered in this book. The pf is derived under the same assumptions as for a 
Binomial variable. However, instead of two outcomes at each single trial, there are k mutually 
exclusive outcomes A,,...,4, where the probability of A, is p, and > DP, =1. The pf of the 


i=l 


variables Y, ='Number of times that A, occurs',i = 1,...,k is 


L 


n! ; os k 
PU 2 Ve) Sommer -++ p?* with ban =n 
y . “V ! 


rd i=] 


Verify that k = 2 gives the Binomial distribution. Here mw, = E(Y,)=n-p,, 
Oo, =V(Y,)=n-p,(— p,;)ando, = Cov(Y,,Y,)=—-n-p,p;,i#/. 


EX 4 Let Y be the variable ‘Number of boys in a randomly chosen family with 4 children’ This can be assumed to be 
Binomial(n, p) with n = 4 and p = 53/103 = 0.516, the latter figure being obtained from population statistics in the 
Scandinavian countries (106 born boys on 100 born girls). By using the pf in (2) above one gets 


4 4 

p(0) (3 (53/103)°(50/103)* = 0.056, p(1) =| F]s31103) 0/103) = 0.235, 
4 2 4 4 3 1 

p(2) = ; (53/103)? (50/103)? = 0.374, p(3) = : (53/103)* (50/103)! = 0.265, 
4 4 0 

p(4) = ; (53/103)*(50/103)° = 0.070 


These probabilities are very close to the actual relative frequencies. However, it should be kept in mind that 
calculations have been based on crude figures and the results may not be true in other populations. E.g. if both 
parents are smokers the proportion born boys is only 0.451 or 82 born boys on 100 born girls (Fukada et al 2002, 
p. 1407). 
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EX 5 In Russian roulette a revolver with place for 6 bullets is loaded with one bullet. You spin the revolver, direct 

it towards your head and then fire. Define the variable Y = ‘Number of trials until the bullet hits your head for the 
first time (and probably the last): The variable can be assumed to have a Geometric distribution with p = 1/6. In this 
case it is perhaps not that interesting to compute the probability that the revolver fires after exact y trials, but the 
probability to survive y trials. From the expression above in (3), Ch. 2.2.1, we get the survival function 


S(y) = PY > y) =(5/6)”, y =1,2,...00 


A few values are: 


y 1 2 3 4 5 6 


Siy) | 0.83 | 0.69 | 0.58 | 0.48 | 0.40 | 0.33 


The median is somewhere between 3 and 4 trials which implies that after 3 successive trials most of the candidates 
will have been hit by the bullet. Russian roulette has been a motive in several films such as “The Deer Hunter’, “The 
Way of the Gun” and “Leon’, just to mention a few ones. The next time you are watching such a film you should have 
the table above in your mind. 


EX 6 Let X (5) be a Poisson process of rate Ay representing the number of road accidents on a road segment. 
During 12 months it is noticed that there has been 18 accidents, so that Ay may be put equal to 18/12 = 1.5. One 
can now calculate the probability of several outcomes such as 


- Atleast one accident in s months, P(X (s) = 1) = > p(x) =1-p(0)=l-e', 
which tends to 1with increasing values of s. xl 

- Atleast one accident in 1 month, P(X(l) > 1) =1-e' =0.777. 

- Atleast two accidents in 1 month, P(X(1) > 2) =1- p(0)- p()= 1-e"'° -1.5-e7'* = 0.442. 


- Atleast two accidents in one month given that at least one accident has occurred, 
P(X(I) = 10 X(1) = 2) 
P(X(1) =1) 


= [The intersection of the two events in the numerator 


P(X() = 2X) 21)= 


is simply X (1) = 2 = a = 0.569. 
P 


EX7 Assume that speed limits are introduced on the road segment in EX 6 and after this one observe 3 accidents in 3 
months. The rate of accidents has thus decreased from 1.5 to 1.0 per month. Does this imply that restricted speed has 
had an effect on accidents, or is the decrease just temporary? We will later present some ways to tackle this question 
(Cf. Ch. 6), but for the moment we just show how the problem of comparing two rates can be reformulated. 


Let Y(t) be the Poisson process of accidents during time t after the introduction of speed limits and let the rate be 


Ay . According to formula (3) in this section the variable (¥(3)X (12) +Y(3)= 21) is Binomial (n,p) with n = 21 and 


pa=y:3/(Ay 124A, 3). lf Ay = A, then p = 1/5, to be compared with the observed proportion 3/21 = 1/7. 


Download free eBooks at bookboon.com 


EX 8 (Y,,Y,.¥;)is a Multinomial variable (n, p, Ps P3)-The pris p(V1,¥35)3 = 


outcomes are often referred to as cell frequencies. 


The mean and variance of Y, — Y, are 


EY, ~¥))= My ~ ay = np, np, =1-(P, ~ Pr) 
VY, -Y,)= 0, + Oy —20,, = np, (1— p,)+np,(1— p,)— 2np, p, = [After some 
re-arrangements] = n-(p, + p,)(l —(p, + p>)) 


2.2.2 


Continuous distributions 


A convenient way to summarize the properties of a continuous distribution is to calculate the (symmetric) 


variation limits (c,,C,) . These are the limits within which a certain percentage of all observations will fall. 
E.g. the 95% limits are obtained by solving the two equations P(Y < c,) = 0.025 and P(Y >c,) = 0.025 
for c, andc, . (Cf. EX 9-EX12.) 


1. 


2) 


Uniform distribution on the interval [a,b], Y ~ Uniformla,b]. 


1 O,v<a 
Density fQ)= (b—a) ,asy<b, cdf F(y)= Wao) axysb 
0, otherwise (6-4) 
lLy>b 


It is easy to show that w =(b—a)/2and o* =(b—a)’ /12. 
Gamma distribution, Y ~ Gamma(/,k) 


This is a class of distributions that is closely connected with the Gamma function I(x) (Cf. 
Section 2.3.5.). The general form of the density is 
fO)= A yhigtyy >0,A>0,k >0. 
(hk) 
Notice that the integral of the density over all values of y is 1, a property that can be used in 


computations. Two important special cases are: 


- Exponential distribution, k = 1, Y ~Exponential(A) ,with density f(y) = Ae*” . 
- Chi-square distribution with n degrees of freedom (df) A =1/2andk =n/2, 
2 
Y~x (n), 


k-1 i 
A 
The cdf can only be expressed explicitly if k is a positive integer, F(y) =1-— > OV os : 
1 


A: 


i=0 
In the exponential case we thus get F(y)=1—e “”. An important theorem that links the 


Exponential distribution to the Poisson process in Section 2.2.1 is the following: 
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Exercises in Statistical Inference 
with detailed solutions Basic probability and mathematics 


Let ee ) be a sequence of times-between-successive events. Then we have the identity 


1) Each X,, ~ Exponential(4) 


4 
2) The X , are independent (4) 


Y(t) is a Poisson process of rate 1 <> 


This gives us a simple clue to determine whether a given sequence of events follow the Poisson 
law or not: (1) Make a histogram of X,and compare it with the Exponential density, (2) Make 
) and 
compute the correlation. For more refined methods the reader is referred a book by Cox & 
Lewis 1966, p. 152. 


a plot of each interval length versus the length of the following interval (X, versus X,,, 


1 T(A+ 
For Y ~Gamma(A,k) we have yw =k/Aando* =k/A°. More generally E(Y") = $b 
! 

which holds for r = K - 2, - 1, 0, 1, 2K. Special case: k =1> a, = E(Y")= a 
The following theorem makes it possible to calculate areas under the Gamma density by using 


tables for Chi-square variables that are found in most textbooks: 


Y ~ Gamma(A,k) => 2AY ~ y? (2k) (5) 


An application of this is given in EX 11 below. 
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3) Weibull distribution, Y~W(a,/ ). This has the density 


f(yza-a-yole*™” y>0,a>0,1>0 


2. 
_T+1/a) 42 -P+2/a)-I7(1+1/a) 
file Aria 


Here . The cdf is F(y) = l-e*”” | This 


distribution is obtained from the relation Y = X'/”, where X ~ Exponential( 7 ). 


Applications can be found in survival analysis and reliability engineering. 


4) Normal distribution, Y ~ N(4,.0 *) has the density 
1 _O-u“y 
f)=z—— e 3" -w<y<om, 


2 
21-o 


where 4 and 0" is the mean and variance, respectively. A standard normal variable is obtained 
by putting « =O ando* =1. The latter is denoted Z ~ N(0,1) and will be used to compute 
areas under the normal density in a way that is described in EX 12 below. Notice that 


Z =(Y -y)/o, the transformation is called standardization. 


The normal distribution can be obtained as a limiting distribution in several ways. Some of 
these are listed below in (a) to (c), where the one in (a) is formulated as a theorem due to its 
importance. A proof of (a) can be found in Casella & Berger 1990, p. 217. A proof of (c) can 
be found in Cramer 1957, p. 250. 


a) Central Limit Theorem (CLT) . Let (Y, jan be a sequence of independent and identically distributed (iid) variables 


with mean j and variance o” . Then the cdf of the standardized variable 


Z, = —i5 _~_Y =F tends to the cdf of Z ~ N(0,1) as n 00. 
1E| dn-o? Vo? /n 


This is denoted Z,,, xs Z,asn— o, (6) 


Y-—np 


Vnp(— p) 


c) If Y(t) is a Poisson process with rate Athen Z(t) = 


b) If Y ~ Binomial(n, p) then Z,, = ’ 57 ~ N(0,1),asn > 0 


Y()-At 
dat 


alternatively, with t = 1 Z(1) —>Z~N(0,l)assA> oo. 


—2 »Z ~ N(0,1) as > ©, or 
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Exercises in Statistical Inference 
with detailed solutions Basic probability and mathematics 


Comments 


- The CLT was first formulated and proved by the French mathematician Laplace about 1778 
(exact year is hard to establish). Notice that it is the standardized variable that has a normal 
distribution as a limit. In some textbooks you may find expressions like ‘ Y hasa limiting Normal 
distribution with mean “and variance o” /n” But this is not true since the distribution of Y 
tends to a ‘one-point’ distribution at y with variance zero. 

- Asyou might suspect, the result in (b) is simply a result of the CLT since Y ~ Binomial(n, p) can 
be expressed as Y = )”Y, where the Y, are iid with a Bernoulli distribution. However, this result 
was published earlier than that of the CLT, in November 12, 1733 by the French mathematician 
de Moivre and it seems to be the first time that the formula of the normal density appears. 

- Further results were later obtained by the German mathematician K.F. Gauss (1809) and 
the Russians Markov (1900) and Liapuonov (1901). It has been found that the limiting Z 


-distribution exists under less restricted assumptions than mentioned in (a) above. 
- Many distributions are related to Z ~N(0,1), e.g. Z* ~ y° (1). 
- If ¥,~N(u,,07) then L= Yay, ~ N with mean and variance given in (2), Ch. 2.1. 
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5) Laplace distribution, Laplace’ first law or the double Exponential distribution, Y~ L( 4,5). The 


density and cdf are 
|y-u| oe 
1 - Li2)se . <b 
band F(yy=y O2)€ +d 


l=(/2e -* pes 


iQ) 5,6 


With mean szand o* = 2b’. 


This distribution and its generalizations to non-symmetric casas has important applications in engineering 
and finance. 


EX 9 Assume that waiting times are distributed U/0,b]. Compute the mean and the median waiting time and also the 
95% variation limits. 


y= b/2, F(M) = =(Put) =1/2=2 M =B/2. 


95 % variation limits are obtained from: P(Y <c,) = F(c,) = il se (Put) = = => c¢, = 0.025b, 


b 


c 
P(Y >c,)=1-Fi(c,)=1- 7 = (Put) =0.025=> c, = 0.975b . The 95 % variation limits are thus (0.0256, 


0.975b). E.g. if a bus runs every 20 minutes from a bus stop, 95 % of the waiting times will range from 0.5 to 
19.5 minutes. 


EX 10 Intervals between arrivals to an intensive care are distributed Exponential(/) . Compute the mean and 
median interval and give the 95% variation limits. 


w=1/A,F(M)=1-e7™ = (Put) =1/2 > e*™” =1/2>4-M =In(2),s0 M = “ oo 


P(Y <c,)=1-e*“ = (Put) = 0.025 > e*" =0.975 > c, =—In(0,975)/A ~ 0.025/A. 
PY >c,)=1-PY <c,)= e*? = (Put) = 0.025> Cc, =—In(0.025)/2 = 3.69/A 
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EX 11 Assume that service times (minutes) for a customer at a cash machine are distributed Gamma(A = 2,k = 2). 
Determine the mean and median service times and give the 95 % variation limits for the service times. 


pehiAa 22 =), 

P(Y <M) =[Notice the trick ]= P(2AY < 22M) = P(7?(4) < 2AM) = (Put) =1/2. From a table of the Chi- 
square distribution we get 2AM =3.36=> M =3.36/4 = 0.84. 

P(Y <c¢,) = PQAY < 2Ac,) = P(y?(4)< 2Ac,) = (Put) = 0.025 . The same tables gives 2Ac, = 0.48 so 


c,=0.12, P(Y >c,) = P(2AY > 2Ac,) = P(y’ (4) > 2Ac,) = (Put) = 0.025. From this 2Ac, = 11.14, so 


Cy = 2.79. 


In this example we have used the theorem in (5) 


EX 12 Y ~ N(y,c° ). Determine the 95 % variation limits for Y . 


PY <c,) = [Notice the triekl= pf 7 Ee =) = Az <4 4) = (Put) = 0.025 
Oo Oo Oo 


a a 


=—1.96 > c, = u—1.96o Similarly we get c, = + 1.960 
Oo 


2.3 Mathematics 


Some mathematics will be needed when solving problems in statistical inference. Here we consider a 


few results that will be needed. 


2.3.1 Functions of a single variable 


A function y = f(x) maps one set of x- values on one set of y- values. The function is called one-to-one 


if only one x- value correspond to a y- value. In such a case one can obtain the reversed map, the inverse 


function x = f '(y). Consider the function y = x”, —0 < x < 0, which maps values along the whole 


x- values on the positive y- axis. It is not one-to-one since e.g. both x = -1 and x = 1 gives y =1. On the 


other hand, y = x”, 0 < x < wis one-to-one with the inverse function x = Vy : 
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Exercises in Statistical Inference 
with detailed solutions Basic probability and mathematics 


Some simple functions 


- Straight line, y=a+b-x,a is the intercept and b is the slope. 

- Exponential, y = ab*. With a = 1 and b = ex 2.7182, y =e having the following properties: 
ets l/e*, en . a ee eur ; (e* li = er 

- Potense, y= ax? 

- Logarithmic (natural), y =\n(x) having the following properties: In(0) > © ,In(1) =0, 
In(e) =1, 
In(x,x,) = In(x,) + In(x,) In(x, /x,) = In(x,) —In(x,) In(x’) = bIn(x) In(e*) = x. If 
y = In(x) then e” = x 


- Logistic (S-curve), y =e'/ (1 +e), wherel=a+b-x. 
Linearization of non-linear functions 


- y=ab", Taking logarithms on both sides gives y'= In(y) = In(ab*) = In(a) + xIn(b) = 
a't+tb'x . So x plotted against In(y) gives a straight line. 

-  y=ax’. y'=In(y) = In(ax’) = In(a) + bIn(x) = a'+bx'. So, In(x) plotted against In(y) 
gives a straight line. 

- y =e' /((1+e'), with/ = a+b-x.Now y/(1— y) =e',so y'= In(y /(-y))=/= at+b-x 
and thus a plot of x against In(y /(1 — y)) gives a straight line. 
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2.3.2 Sums and products 


The sum of X,,....X, =X, +..+X, = ba? . The x, are terms Sometimes we drop the lower or upper 
i=l n 

index in the summation sign if they are obvious. The product of x,,...,.x, =X,++°X, = I] x, . The x; 
i=l 


are now termed factors. 


Some rules 


2 
n nN 
2 
- (x, +x,)? =x) +x} +2x,x,. More generally: [3 «| = > sae as > x;x; . Notice that 


i=l i=] l<i< j<n 


the last sum contains n* —nterms of the form ate 
n vx 
- []e* Seuunget eer ch (e")= a 
i=l 
n n n n 
: Ya x, = a> x, |] X,= a"| |x, , where a is a constant. 
i=l i=l i=l ial 


_ 1 n n n 
EX 13 X = —)>'x, is termed the arithmetic mean. Obviously > (x, _ x) — > a x= 
N j=] i=l i=l i=l 


n 


Dx, —n-¥ =0. 


i=l n 
Let a be an arbitrary constant.Then (x, — a)? is minimized if a = X. 
i=l 


Proof: y (x,-a)? = [Notice the trick] = y ((x, —x)+(x- a)y = y (x,—x)* + Ye —a)? + 


i=l 


n n n 
2Ge —x)(x-a)= SG —x)? +n(X—a)’ +2(x- ayy (x, — X), where the last term is zero. 
i=l i=l al 


2. 
n n n n 1 n 
=\2 _. 2 : ry ed = 2 a 
Notice that D; a i 2 He ee 25%; = Dae 3 [Ss] . The latter expression is often simpler 
- = = = - 
to use in calculations. ; ; ‘ 


2.3.3 Derivatives 


The derivative of y = f (x) with respect to x is the limit f'(x) = lim( f(x +h)- f(x))/h ash> 0. 
dy 


Other notations for a derivative are y', —,—— or D, f. Rather than having to calculate the limit it is 
a . 


2 


easier to use the following rules. 
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Derivation rules 
1) Special functions 


F(x), a+ bx x" e* en In(x) 


f°), . bx? e* 2'(xjes™ 1/x 


2) f(x) = sth) > f(x) = g'(X)EA'(x) 

3) f(x) = g(x) A(x) > f(x) = g'(x) A(x) + g(x) h(x) 

4) f(x) = g(x)/A(x) => f'(x) =(g"(X)- A(X) - g(x) -F'(—))/ 172) 

5) f(x) = g(h(x))=> f'(x) = h'(x)g'(h). This is a very useful rule that is demonstrated in 
EX 14 below. 


Ex14 f(x) = In(3x +1). Put A(x) = 3x + Land g(h) = In(A)in (5) above, with h'(x) = 3, 
g'(h) =1/h then f'(x) =3/Bx+)) 


f(x) = V2x. Put h(x) = 2x and g(h) = Vh = h'”, with h'(x) =2, g'(h) =1/2-h1? = oem 


1 
f'() =. 
V2x 
2 dy dy 
y= (x = a) a = 2(x = a), aa a (-1) : 2(x = a) = 2(a = x). The function y can be considered as a 
IX a 


function of either x or a. 


y=>\(x,-a)’. &. = [There is just one x, |= 2(x, — a), ” =>) (-1)-2(x, -a) =2) (a-x,) 
i=l x; a i=l i=l 


Two important theorems about extreme values 


- If f(x) has a local maximum (max) or minimum (min) at x = x, then this can be obtained 
by solving the equation f'(x) =0 for x = x,. Furthermore, from the sign of the second 
derivative f''(x), we draw the following conclusions: 


o >0= f(x) has a local min at x = x, 
f'' (Xo) 


<0= f(x) has a local max at x = x, 


- If f(x) >Othen f(x) has a local max or min at the same x- value as In( i; (x)) 


EX 14 Does the function f(x) — gr have any max/min-values? Since F(x) > 0 we prefer to study the 
simpler function Z(X) = In(f(x)) = —(x—1)’. since z'(x) =—2(x -1) =0 => x, =1, this must be 
a value of interest. Now, Z''(x) = —2 <0, from which we conclude that the function has a local maximum at 


x=l1. 
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2.3.4 Integrals 


b 
The (Riemann) integral | J (x)dx is the area between a and b under the curve f(x). 


a 


Integration rules 


1) | fda = [Foo = F(b) — F(a)where Fisa primitive function to f. Since F'(x) = f(x) 


we can use the derivation rules above to find primitive functions. 


b 


2) [(g@ af: h(x) \dx — [ sax ar [Aca 


3) ( g(x) -A(x)dx =[G(xyA(ayf - | G(x)h'(x)dx (Partial integration) 


a 


a a 
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1 2 x=l 
x 1 1 

EX15 [(I—x)de=|x-—-| =1-—-(-0)=—. 

‘i 2 5 2 2 
1 1 1/2 71 

1 _ x 

[ar =[x ""a-| =2-0=2. 
0 Vx A 1/2 ~ 
t =x -x POS -0 , i i ‘ eet 
fe dx = g e im = —0-(-e™ ) =1. Anarea under an infinitely long interval can thus be finite. This is an 
example of a mathematical paradox since it would imply that we could paint an infinitely long fence, having an 
exponential shape, with a finite amount of paint. 


2.3.5 Some special functions and relations 


Let n be any of the integers 0,1,2,.... Then n! (‘n faculty’) equals 1 for n = 0 and 1-2---n for n >0. 


The combination operator i) E.g. > = =u = 
x) x'(n—x)! 2) 2!3! 


Some series 


Yis mee yi? = wee 
i=l 


i=] 


" . |-x a, 1 
- Geometric) x’ = : a = ——.,, provided that -1< x <1. 
i=0 1-x i=0 1-x 


i=0 \ 2 


- Binomial ae a =(a+b)" 
- Exponential + =e* 
i=0 °° 
- Taylor Let f(a) be thei: th derivative of f(x) computed at x = a with f(a) = f(a) 
. Then f(x) = yee f(a). In practice this may be used to approximate f(x) by a 
i=0 7 


polynomial. E.g. f(x) = f(a) +(x-a) f'(a)+ C—O pra) .In this case f(x) has been 


approximated by a Taylor polynomial of order 2 about a. 


EX 16 


: : = A ae — . — = = _ 
2 0.8 Z 0.8’ —0.8 108 —-1=4 x7] [Put a=1=b |= (1+1)°=256 


“{n ; ; 
Let 0 < p <tandconsider |" od 2) =(p+l-p)’ =1- 


i=0 
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Gamma function 


For any p, define the Gamma function I'(p) = (x? te “dx. Tables of this function can be found in 
0 
Standard Mathematical Tables. Tables can also be produced by using program packages such SAS, SPSS or 


Statistica. The behavior of the function is quite complicated but we will only need the following properties: 


- T(p+l=p-T(p) 
- T(p+l)= p! if p =0, 1, 2... 


Cauchy-Schwarz inequality 


Let x, and y,be real numbers. Then ( ay y 2 (Sx? \>y?). 


2.4 Final words 


Notice the difference between a discrete and a continuous variable when calculating probabilities. For 
a continuous variable Y the probability P(Y = y)is always 0. This implies that P(Y > y)= P(Y > y). On 
the other hand, for a discrete variable, P(Y > y= P(Y = y))+ P(Y>y). 


The population median M is a value such that F(M) =1/2 and nothing else. The sample median m is 
obtained by ranking the observations in a sample and to let m be the observation in the middle, or the 


average of the observations in the middle. m may be used as an estimate of M. 


In Ch. 2 we only considered discrete bivariate distributions. Continuous bivariate distributions are 
treated analogously. The essential difference is that all summation symbols in properties (1)-(10) are 


replaced by integrals. 


n 

The reader is encouraged to use the summation symbol by: rather than x, + ... + x, and the product 
n i=l 

symbol | | x, rather than x, - ... :x,. In the book we will use alternative symbols for division. To save 


i=] 
space we write a/b instead of an typical example is =e 
b cldt+elf 
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3 Sampling Distributions 


Data consist of observations y, ...,y,,(numerical values) that have been drawn from a population. 
The latter may be called a specific sample. If we want to guess, or estimate, the value of a population 
characteristic such as the population mean sone may take the sample mean y = > y,;/n. Any new 
sample of n observations drawn from the population will give rise to a new set of y — values and thus 
also of y. To understand this variation from sample to sample it is useful to introduce the concept of a 
random sample of size n, Y,,...,Y,. Throughout this book it will be assumed that the latter variables are 


independent so that the probability of the sample can be expressed as in (1a) and (1b). 


The appropriateness of taking the sample mean as a guess for “/ can be judged by studying the distribution 
of Y and calculate the dispersion around /“. However, Y is just one possible function of Y,,...,Y,, and 
there might be other functions that are better in some sense. Every function of the n-dimensional variable 
is termed a statistic with the general notation T = g(Y,...., Y,,) . The distribution of T is called a sampling 
distribution. If the purpose is to estimate a characteristic in the population, T is called an estimator and 
a numerical value of T is called an estimate, t. If the purpose is to find an interval (7,,7,) that covers 
the population characteristic with a certain probability it is called a confidence interval (CI). Finally, the 
statistic is called a test-statistic if the purpose is to use it for testing a statistical hypothesis. In this chapter 


we consider some exact and approximate results of sampling distributions. 


In the past four years we have drilled 


31,000 km 


That's more than twice around the world. 


Who are we? 

We are the world’s leading oilfield services company. Working 
globally—often in remote and challenging locations—we invent, 
design, engineer, manufacture, apply, and maintain technology 

to help customers find and produce oil and gas safely. 


Who are we looking for? 
We offer countless opportunities in the following domains: 
= Engineering, Research, and Operations 


= Commercial and Business 


lf you are a self-motivated graduate looking for a dynamic career, 


apply to join our team. What will you be? 


careers.slb.com Schlumberger 


*") 
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3.1 Some exact sampling distributions 


Sum of variables 


1) Y, 


L 


~ Bernoulli( p) > >, ~ Binomial(n, p) 
i=l 


k k 
2) Y, ~ Binomial(n,, p) => » ~ Binomial 3 nae) 


i=] i=l 


3) Y, ~ Poisson(/,;) > yy, ~ Porson 4) 


i=l i=l 


4) Y,~N(u;,0;)> yay, = a Seana? 
i=l i=l i=l 


5) Special case with “ =[l,0; =o" and a, =I1/n: Y ~ N(u,07/n) 
n n bye n 
6) Y.~Gamma(A,k.)=> > Y,~Gamma\ A,» k. |, =+—~ Gammad| nd, Yk. 
é (Ak)=> DY, Dok b= Ds 


i=l i=l 


i=l 


7) Special case with k; =1: Y, ~ Exponential(A) > pee ~ Gamma(A,n) 


i=l 
k 
Nn; 
i=l 


~ y?(n),or » — pl)’ ~o° - y(n). Notice that the sign ‘~’ 
i=l 
(distributed as) can be treated in the same way as the equality sign. 


k 
8) Special case with 2 =1/2andk, =n, /2: Y, ae aja > Y, ~2{ 


i=] 


Sum of quadratic forms 


n 


> - Hy 


9) Y, ~N(o") > 


a 2 2 
10) Y, eat we. x7),0r(¥-n) ~2 770). 
o-/n n 


An important theorem on chi-square distributed quadratic forms is the following theorem 
(Cochran, 1934) 


Cochran's Theorem: Let O,, Q, and Q, be quadratic forms such that O, = Q, + Q, then 


2 
2 d a: QO, aw A (n —N3) 7 
QO, ~ x(n, )and Q, ~ x" (n,) > . and Q, are independent ” 
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EX 17 Prove the relations in (9) and (10) above. 


n 


paar s 


Y= . 
Y,~ (07) = + te Vg 


2 
Neon 4 sid 
Oo 


V_ V_ 2 
Y,~ N07) => ¥ ~ No? /n) => —# ~ NO) = G2 ~ 720). 
Oo /n 


ol Vn 


Ser) 


EX 18 Use Cochran's Theorem to show that Y, ~ N(u,0°) => a ~y(n-l). 
o 


¥,-“=(,-Y)+V¥-wW > VG -w’ =) -Y7 +--+ 


i=l i=l i=l 


20 —Y)(Y — 1). Here the last term is 2(¥ - >, —~Y) =O (cf. EX 13). So, 


i=l i=l 


yw? SY-Fy _ 
20d _O-w 


2 2 2 
oO o o'/n 


or O, = Q, +Q; 


The result now follows from (9) and (70) above. 


» (Y,-¥)’ 2 
EX 18 (Continued) The sample variance is defined as S* = —=! i ns ( 2 ) x (n-1).Notice that O, 
n- n- 


that statistical dependency and functional dependency are two different concepts. 


is a function of S? and Q, is a function of Y . Since Q, and Q, are independent it follows that S? and Y are 
independent random variables. So, if we repeatedly compute S? and Y in samples from a normal distribution we will 


obtain a zero correlation. This may seem to be amazing since S7is functionally dependent of Y , but it illustrates 


Ratios 


11) Student's T with f degrees of freedom, T(f) 


Z ~ N(0,l)andV ~ y?(f) are independent > 


Z 
win”? 


Tables showing areas under the density of T can be found in most elementary text books. 


12) Variance ratio F with f, and f, degrees of freedom, F(f,, f) 


Vi~ x (f)andV, ~ 7 (f>)are independent > a 


2 2 


~F(fi. fs) 
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Tables showing areas under the density of F can also be found in elementary textbooks, but these are 


more comprehensive and seldom show areas for all values of f,, f,. Sometimes one can use the fact 


F(fi> fs) =1/F (fa. f,)- 


Order statistics 


A random sample of m independent observations(Y,);, can be arranged in increasing order, from 
the smallest to the largest Y.) < Y.) <...< ¥(,). Here only the distribution of the smallest and largest 
observations Y(, and Y/,, are considered. We also restrict ourselves to the case with continuous variables. 


The distributional properties are summarized in the following theorem: 


Yu has cdf Fy, (y) Ss (er (y))=[If ally, ~ YJ=1-(1- 7,0)”. 


i=l 


In the latter case Fi, (y) =nfy (y)(I —F, (yy), 


Y, hasedf Fy (v)=[ [ 0) = [If ally, ~ Y]=(F,0))" (8) 


i=l 


In the latter case fy (vy) = fy (v)(F, (yy. 


EX 19 Determine the cdf and density of Y(,) if all ¥, ~ Y ~ Exponential(A) . 
F =l-e"=>F aj=le"7 s1= ee, = nie "*” Thus, the smallest of n 
y\V Yay Yay VY 


; 1 
observations is Exponential(n/) , so the expected value of Y,,) is — 
ee 


EX 20 Determine the cdf and density of Y/,, if all Y, ~ Y ~ Uniform|0, b| : 
nl 


‘i n 
F,Q)=5, 0<ysb>F,, (a7, Tig a O<y<b. 


n-1 


b b 

n n P 

EQm)= fy = dy == Jy dy = 
0 0 


n pr n 


. = -b. 
b" (n+l) (n+) 


3,2 Sample moments 


In Ch. 2.1 we introduce the population moments @, = E(Y’) and the population central moments 


“= (v — py) ‘ By means of the Binomial series in Ch. 2.3.5 we can express yz, in terms of q, in 


r r er _ 
the following way. = E(y - yy )= {y ") cae = Y {Jewry —u)' "= YF (-“)"". 
i=0 \! i-0 \! i= 
From this we get e.g. 4, =Q@)M° —2Q,u+Q,u =a,-a,. 
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Exercises in Statistical Inference 
with detailed solutions Sampling Distributions 


1 n : _ 1 n = 
The corresponding sample moments are a, =-)) Y/ with a, = Y andm, = ->, -Y)". Instead of 
n n 


i=l | i=l 


studying the properties of m, in general we confine ourselves to S* = aap -Y). 
nh) j= 


The following theorem gives some properties of sample moments. 


n 
If (Y, ie are iid variables with mean // and variance o~ , then 


EG )=@,.o¥ @ j=, ei (9a) 
n 
E(S?) =o" 18)=[ 14 —_— 2} (9b) 
(n-1) n 


The expressions for V(S”) above is proved in a book by C.R. Rao 1965, p.368. Proofs of the other relations 


are left as exercises for the reader in EX 21 below. 


\ 


(nase) 


\ ACCREDITED 
\ Wy, 


CLICK HERE 


to discover why both socially 
and academically the University 
of Groningen is one of the best 
places for a student to be 
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EX 21 E(a,)= <2(S¥')==y £07) = <a, =a,° 


E(a?) = = H(D¥)')= [ce ch.2.3.2]= — (07)! +25 yr )= 
2 2 
—(heu) + 25° ¥ E(Y/)EY!))= ac "Ay, + 2 aya, = a a, ~ “* 7 


So, V(a,) = Ea?) E*(a,) = (a, -a@2)~ 
n 
7\2\_ _ De 2 = 2 me 2 je 
AY, -¥)")=[crEx13]= (Dy? - (Dy)? /n)= Ye?) -— 2”) 


[Cf. expression above] = n- a, =Ah ly +(n? —n)- @,ct,)= (n-1)-(a, -@?)- 
n 


EX 22 Let (Y, ‘i be iid and distributed Exponential(A) . Determine V(S”). 


I rl. 1 
My =V(Y)= pe and from Ch. 2.2.2 (2) a, = re “=p= pi 


44 
My = >i (-“)*" = au" —40np? + 6a,u° —4a,u+ ay -1= 


i=0 


9 
1 yi 2 1 6 1 24 


+ + —_— 
A AR PR RBA A A 


Thus, from (9b) 7(S2) [2 came =} ee! 


AA (n-l Ai dn n(n-l As 


3.3 Asymptotic and approximate results in sampling theory 


Sometimes it is not possible, or very hard, to find the exact distribution of a statistic T, based on n 
observations. In such a case one may try to find the asymptotic distribution when n is large. If also this 
is a stumbling block one can try to find at least approximate expressions for expectations and variances 


of T,. In this section we present some ways to handle these problems. 


3.3.1 Convergence in probability and in distribution 


By convergence of T, in probability towards a constant c when Nn —> © we mean that the probability for 
the event that the distance between 7, and c is positive, tends to zero with increasing n. In symbols this 
is expressed by T, —"->c, as n — oo.In practice it is often cumbersome to verify if the latter probability 


tends to zero. Then one may use the following theorem. 


E(T,)=candV(T,) >0>T, —">c (10) 


By convergence in distribution (or in law) we mean that the cdf of 7, tends to the cdf of T, say. 


In symbols we express this by 7, —?-»7 . An example is the CLT given in (6). 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 
with detailed solutions Sampling Distributions 


Some important results 


Let g be a continuous function, then the following relations hold. (For proofs the reader is referred to 
Ch 2c.4 and Ch 6a.2 in Rao 1965.) 


1. —" se > g(T,)— > gc) (11) 
T, @9T = g(T,)—>2(T) 


T,+U,—»5T te 
T, oT andU, 9c > 4 T,-U, “oT -c (12) 


T,/U,—9T le 


Let @ bea parameter and let the variance of T,, be o (0), a function of 0. Then 


Vn(T, -0) 2 ~ N(0,07(0) = Valg(Z,)- (0) 2 X ~ N(0,[z'() Fo?) (13) 


We now consider applications of (10)-(13) 


www.alcatel-lucent.com/careers 


A = 


What if 
you could 
build your 
future and 
create the 
future? 


One generation’s transformation is the next’s-status quo: 

In the near future, people may soon think it's strange that 
devices ever had to be “plugged in.” To obtain that status, there 
needs to be “The Shift”. 
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, 1x 
EX 23 Let (Y, ee be iid with Y; ~ Bernoulli(p) .Put P = -»°Y, =Relative frequency of ‘success’ after n trials. 
nal 
a) Show that p—> p, asn— oo, 


This follows from (10) since [cf. 2.2.1 (1)] E(p) = p andV(p) = doe 2) >O0 ano. 
n 


The fact that P + PD has been termed Jaw of large numbers. It can be empirically verified by throwing a 
thumbtack a number of times and noticing the relative frequency of the event ‘tip of the tack is up’ The author, with 
his particular type of thumbtack found that the frequency stabilized around p = 0.6 after about 20 trials. The outcome 
is of course depending on the experimental conditions, but the reader is encouraged to repeat it, with a shoe or 

a coin. It is instructive to plot the relative frequency of the event on the Y-axis against the number of trials on the 
X-axis. 


(P=p) 


Vp(l— p)/n 


The left hand side is, after multiplication with n in both numerator and denominator, 


b) Show that P ¥Z~N(0)),as n>. 


vy, —np 
i=l 


_ The CLT in (6) now gives the result. 
Vnp(l— p) 


c) Show that P=?) > 57 _ N01), asn—> 00. 
Vp(l— p)/n 


The left hand side can be written 


(P, —P) 
yp p)/n 
P,d-Pp,)/n _p I 
p(l— p)/n 


—2>Z ~ N(0,1) 


The convergence in the numerator was shown in b). To prove the convergence in the denominator, notice that 


Pp, —> p= 2(p,) = B, 1- p,)—> g(p) = p(1— p) Finally, the result follows from (12). 


Comment: The difference between the expressions in b) and c) is that in c) we have replaced p by an estimator P, in 
the denominator. This will simplify calculations of confidence intervals (cf. Ch. 5). However, n in c) needs to be much 
larger than in b), for the approximation to normality to hold. If p is not too far from 0.5, then n about 50 is sufficient 
for normality in b), while n perhaps larger than 5000 may be required in c). 


Vn(in p - In p) 


d) Show that 
V(l- p)/p 


P 57 ~ N(0,l) 


Multiplying the left hand side in b) by «/p(1— p) and using (10) gives /n(p, — p) > 


4 p(l— p)-Z ~ N(0, p(.— p)). Since In xis continuous with derivative 1/x it follows from 


2 
(13) that Vn(in p, —In p) D one -p(l » from which d) follows. 
ip 
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EX 24 Let (7 je be iid variables with E(Y,) = wand V(Y,) =o” . Show that 


¥— uy 
n( ‘H) D 7 (1) 
This follows because val¥ — “) ey ae N(0,1) according to the CLT in (6). From (11) 
oO 
a 2 
nl¥ =n) D7? — (l) 
oO 


EX 25 Let (Y, Jes be iid variables with E(Y;) = wandV(Y,) = o” . Show that 
y- 
Cae) Boo. MOS 
Sin 


Dividing numerator and denominator by o/ s/n yields 


(Y=) yz. N(O,1) 


al dn , and the result follows from (12). 


Sidn Sp 


ey | 


aidn oO 


Here 22 for two reasons: (i) V(S’) >0 [cf (9b) = 97? 562. 
o 


(i) g(S’) =VS?/o0* 49 07 /o? =1 Ich. (10). 


3.3.2 Approximations of moments 


Pe . : : 2 . 3 
Let Y,,i=1,2 be two random variables with means “4, and variances o; and with covariance 0. From 


Taylor expansions of a function g of the variables, one can show the following. (Cf. Casella & Berger 
1990, pp. 328-331.) 


E(g(¥,))* (us) +58"M) oF, V(e(¥,)) = [2'(u)) 07 


L 


Cor(g, (Y,), > (Y;)) ® 2" (Ly, )8'y (Ma) O12 (14) 
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Exercises in Statistical Inference 
with detailed solutions Sampling Distributions 


EX 26 


a) Let Y ~ Gamma(JA,k) . Determine the approximate mean and variance of In. 


From 2.2.2 (2) we know that the mean and variance is 44 =k / A and o” =k/ A” , respectively. The derivatives of 
g(y) = Inyare g'(y) =1/y and g"(y) = —1/ y*. Thus (14) gives 


1 
E(inY)* Inge Clg’) -o? = In(k/A)-1/2k, V(InY)= (-1/py «0? =1/k. 
b) (Y;)i, is a sequence of iid variables, each being distributed Gamma(A,k) . Determine the approximate 
mean and variance of InY, whereas usual Y = )°Y, /n. 


i=l 


From 3.1 (6) we know that bys ~ G = Gamma(A,nk) . Now, InY = In(G/n) = InG — Inn. From a) above we 
get E(InG) = In(nk / A) ='1/2nk and V (InG) ~ 1/nk. Thus, 


E(InY) = In(nk /A)—1/2nk —Inn = In(k/A)+Inn—1/2nk —Inn = In(k/ A) —-1/2nk 
V (InY) »V(InG) +V (Inn) =V(InG) + 0 =1/nk 


Notice that, as —> 0, E(InY) + InE(Y,)andV(InY) > 0. 
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A generalization of (14) to a function of two variables is 


| Pett ad Behe) <s 5 8(th Ha) 
epee 0, 


E (¥,,¥. = > + *O, +—-— ‘0, + 
ear na Rea e dy? dy; : dydy, 
dg (ttf) de(ut,,H))) de(tt,, 1) de(1, Hy) oy 
r(e(%.¥) =| Beet) | of +[ See) ot (Ly Hy I (Lys Hr }:2 
dy, dy, dy, dy, 


EX 27 


Let Y,,i =1,2 be correlated variables with means 4, and variances o? and with covariance 05. Derive the 
approximate mean and variance of R= Y,/Y,. 


We start by computing the derivatives of the function g = yy lV (cf. derivation rule (4) in Ch. 2.3.3). 


2 2 g 
dg _ 1 dg 0, & y dg 2y, as __ | thus 


dy, y, a (dy, ys hy 3 Andy, ys 
Y 5) 2 2 
ef a rae 0+ nt of ae = 2 on | 
Y, My, 2 My My M2 Hy Mikes 
y, 2 2 27 2 2 
r( +) =2i +f ng), 2 Ho, =[ A me 2 O12 
Y,) by My My by My) Lop oy Mb 


3.4 Final words 


Uppercase letters or lowercase letters? Uppercase letters, such as S* for a sample variance, are used for 
statistics when we want to stress that the quantity has a distribution. Lowercase letters, such as s*, are 


used for specific values of a statistic. 


The distribution of a statistic is called a sampling distribution. This is a creation by statisticians for the 
purpose of drawing conclusions about parameters in the population and it has nothing to do with the 
real world. Distributions that are intended to reflect facts in nature or society are called population 


distributions. 


Asymptotic results are obtained as a limit, e.g. when n > «© and p > 0 in the Poisson approximation of 


the Binomial distribution. Approximate results just mean that they are not exact. 


Knowledge about sampling distributions is the key for understanding the content in the following 
chapters. It’s therefore important that you are comfortable with the properties in (1)-(10), and also of 


Cochran's theorem. 
We have assumed that there is a given a random sample. This can be achieved in a verity of ways. In this 


book we don't bother how the sample has been collected. For readers interested in these matters there 


is a hugh amount of literature in the field. (See e.g. Scheaffer, et al, 2012). 
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Supplementary Exercises, Ch. 3 


EX 28 Let (Y, )"", be iid variables. 


Find cdf and density of the smallest observation Y,,) if Y, ~ Uniform [0,5]. 


EX 29 Let (Y, )"", be iid variables. 


Find cdf and density of the largest observation Y,, if Y, ~ Exponential(A). 


EX 30 Let Y ~ Binomial(n, p) and put p = Y/nso that E(p) = p andV(p) = p(1— p)/n.As an estimator of 
V(p) one may use V'(p) = p(1— p)/n. Show that the exact mean EV(e)) and the approximate mean obtained 


from (14) are identical. 


EX 31 In medical statistics one often wants to study whether a factor F causes a disease. Data from two independent 
samples of sizes 11, and n, can be summarized in the following frequency table: 


Diseased Not-Diseased Total 
F is present Ne n,—Y, ny 
F is absent Y, n, —Y, Ny 


Data are analyzed by comparing the Relative Risk R = P,/ p>, where p, = Y,/n,,i =1,2 with the hypothetical 
value of 1, being obtained if F does not cause the disease. The variance of F is estimated by 


P(R) = R? m—-Y  m-h 
mY, NyY, 


Justify this expression by using the result in EX 27. 


[Hint: Use the fact that Y, and Y, can be treated as two independent variables that are 


~ Binomial(n,, p;),i =1,2.1 


EX 32 The sample variance S? isin general unbiased for o” (cf. (9b)). However, S is not in general unbiased for oO . 
Determine approximate expressions for E(.S) and V(S) in the following cases: 


n . D: 
a) 84 ye are iid with expectation {4 and variance o~ with a general distribution for Y,. 
b) - with Y, ~ N(u,0°7). 
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Exercises in Statistical Inference 


with detailed solutions Point estimation 


4 Point estimation 


In this chapter we deal with the problem of how to estimate an unknown characteristic in the population 
based on a sample of n observations. Focus will be on the estimation of parameters, such as the variance 
o” in a normal distribution or the upper point b in a Uniform distribution. We briefly also consider 
the estimation of functions of parameters and other quantities such as probability and cdf. First some 
concepts are introduced and then we discuss some requirements on good estimators. Finally some 


estimation methods are presented and evaluated. 


4.1 Concepts 


A statistic T is a function of the random variables Y,,...,Y,,in a sample. A point estimator is a statistic 


that is used to estimate the value of an unknown parameter in the population, in general denoted 0. A 


point estimate t is a numerical value of T, obtained in a specific sample. 


In (1a) and (1b) we introduced the concept of probability of a random sample of independent observations. 
This is a function of the variable values y,,..., vy, . If we instead consider it as a function of the parameter 
6, it is termed Likelihood L(O) = L(),,...,¥,,0). When we want to study the long-run behavior of the 
likelihood over all possible drawn samples, we use the notation L(¥,,...,Y,,,0). In the latter case L is a 


random variable. 
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Exercises in Statistical Inference 
with detailed solutions Point estimation 


Intuitively the observations in a sample contain information about @ in some sense. (The statistical 
concept of information will be defined formally below.) E.g. given the body weights in kg, 75, 50, 90, 
72, 78 of five persons drawn from a certain population, we conclude that the population mean should 
be slightly larger than 70, but also that the dispersion is quite large. Sometimes all information about 0 
is contained in a single statistic T. In such a case T is termed a sufficient statistic for 0. If we have found 
a sufficient statistic T for 0 we can, roughly speaking, skip the original observations and only use T for 


making inference about 0. The following factorization criterion can be used to find a sufficient statistic: 


Assume that the likelihood L can be factorized into two parts such that 


LY,,...¥,,9) =£,(7,0)-L,(%,.-¥,,), (16) 


where L, only depends on T and @ and L, and does not depend on 7 and @ but possibly only on the observations, 
then T is sufficient for 6. 


More generally, 7,,...,7,, are simultaneous sufficient statistics for 6,,...,,, if (16) holds with T and @ being 


replaced by the corresponding vectors. The following results can be useful: 


Let g be a continuous function. Then: 7 is sufficient for 0 => g(T) 1s sufficient for g(A) (17) 


A sufficient statistic is unique. (There can't be several sufficient statistics for a parameter besides 
functions of T.) (18) 
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EX 33 (Y, He are independent variables in a random sample. Find sufficient statistics in the following cases: 
(See Ch. 2.2 to find the various distributions.) 


a) Y, ~ Binomial(m,, p) 


L= re" (l— py" a (l poe : [”") From this we conclude that the statistic)’ Y, is 


i=l 


sufficient for p. 


b) Y, ~ Geometric(p) 


n ; n 
L= he (l- p)=p"™ (l—-p)" +1 . Thus, >), is sufficient for p. 
i=l i=l 


 Y, ~ Poisson(A) 


n 
L=-T12_e7 oe _ i Thus, ¥Y, is sufficient for 2. 
ia Ji! * i=l 
[Ty 


d) Y,has the density f(y) = 2Aye” . This is called the Weibull distribution and is used as a model for life 
lengths of materials. 


n n n 
L= [ [24% =j"e . 2"T Ty, . Thus, > i is sufficient for J . 


i=l i=l 


e) - ~ Gamma(/,k) 


Ark n cs “ay. Vi n n : . 
yee ™ = y,| e ™@ -1.Thus, by Y,, I] Y, [are simultaneous sufficient for (A,k) F 
=l oe P"(k) i=l i=l i=l 
f) Y~ ~N(u,07 “ew? Yow? 


: l _izl : n 
si e % Here > (y,-p)* = 
t-Tooon ayia © ~ ny" 2(a2)"? 3 y H 


i=l 


Yo. My “Yo y)+(V- mw) Do, yy ‘Lo- uy +220,- V\P-W) = 


> 0; - yy’ +n(y- ay since the last term is zero (cf. EX 13). From this it follows that 
i=l 


ba, mas = ] are simultaneous sufficient for (u, o ) ; 
mil 


Download free eBooks at bookboon.com 


4.2 Requirements on estimators 


In order for an estimator 7,, of 8, based on n observations, to be considered as good one usually requires 


the following: 


- T, is consistent for 0. This means that the estimator converges in probability towards the 
parameter, 7), —? +0, as n— oo. Remember from (10), Ch. 3.3, that a sufficient condition 
for this is that E(7,,) = 9 andV(T,,) + 0. Estimators that are not consistent are useless in the 
sense that we do not necessarily get closer to @ by increasing n. 

- T,,is unbiased for 0 which means that E(T,) =@. The difference E(T,,) —@ is the bias of the 
estimator, denoted bias(@). The dispersion of 7, around @ can be measured by the Mean 
Squared Error (MSE) of T,, Er, ~6)° =V(T,)+(bias(@))’ - 

- T, is a minimum variance estimator (MVE) which means that V(7,) is smaller than the 


variance of all other estimators. A MVE is unique, so there can only be one estimator with 


smallest variance. 


The problem of finding a MVE is rather complicated. Before treating this we consider some results about 


derivatives of the log-likelihood function. The function -_ is called a score function and it plays an 


important role in statistical inference. From Eq. (la) and Eq. (1b) it follows that 


(continuous case) (19) 


“dl "dl 
~ = 2, -, 7 29) (discrete case) and = yen an 9) 


i=] 


In order to obtain further results in the continuous case we set up the following conditions: 


a) The range of y- values in f(y,@) does not depend on @. (20) 


2 
5 dinf nad In f 
do de’ 


are continuous. 


Notice that (20) does not hold for Y ~ Uniform|0, b| with density f(y,b) =1/b,0< y <b, but for all 


other densities we have considered so far. 


If the conditions (a) and (b) in (20) holds then 


a) { 222)-0 (21) 
do 
2 
b) 10)=1( RE) = d es 
do dé 
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The function /(@) is the information about 6 that is contained in a sample of size n. A solution to the 
problem of finding a MVE is given by the following theorem, called the Information inequality or the 


Cramer-Rao inequality after two of its discoverers who published the result in 1945. 


fi in dbias(@) 


do (22) 


i 
V(T,)>= = (If T,, is unbiased) = a 


1(0) (0) 


The lower limit in (22) for the variance is called the Cramer-Rao (C-R) limit. The limit may not be 
attainable for a MVE, but no estimator can have smaller variance. Thus, if we have found an estimator 
with a variance that equals the C-R limit, then we have found a MVE. But, if the variance of an estimator 
is larger than the C-R limit, the estimator may still be a MVE. The search for a MVE in the latter case 
can be complicated. Some help may be obtained from a theorem of Rao and Blackwell (Casella & Berger 
1990, p. 316), but this is beyond the level of this book. 
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EX 34 Let (y, yy be iid variables where Y, ~ Exponential(A) 


a) Find an unbiased estimator of E(Y;) =1/ A that is based on the smallest observation Y(,) . Is this estimator 
consistent? 


F,(y)=1-e*? = [See(8)]> F,, ()=1-[-(-e* Jf =1-e" = 
Yq) ~ Exponential(nd) => E(Yq)) =1/nA «Thus, T,, = nY,) is unbiased for1/ A. 


b) The variance of this estimator is V(T,) = nV (Yq) = n —— = 1/A7 which does not tend to 0 as 
A) 


n — ©, 50 the estimator is not consistent. 


c) Finda sufficient statistic for A . 


n Ady; n 
L= [| Ate 2 1s ps is sufficient for J. 


i=l i=l 


d) Determine the information about A that is contained in a sample of n observations and also the Cramer-Rao 
(C-R) limit. 

-A ; Ji 

InL=In(A")+Inje * 


dinL n y _ Ein _ n .Thus, 
a ae’) 


=nini-a-)> y,> =—- 
ar aa 


n n 
considering the latter as a random unit we get (A) = {- = =z . The information increases with n 
and decreases with increasing J . A A 


vhs 


oy 


The C-R limit for any unbiased estimator of 2 is 


4.3 Estimation methods 


In this section we present some general methods to obtain estimators. Focus will be on the case with 
a single parameter, but examples in the multi-parameter case are also given. It is required that the 
estimators are unbiased. In this case the precision of an estimator can be measured by its variance. When 
comparing estimators it is useful to use the concept of relative efficiency of an estimator 7, relative to 
another 7,, RE =V(T,)/V(T,). Examples of REs for estimators produced by various methods are given 


in the Supplementary Exercises of this chapter. 


4.3.1 Method of Ordinary Least Squares (OLS) 


The method originates from the works by the mathematicians Legendre (1805) and Gauss (1809). Many 
students are familiar with this method as a way to fit a straight line to data points in the plane, but the 


method can be used in more general contexts. 
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Given the variables Y,with expectations E(Y,)=g,(0), i = 1...n, consider the sum of squares 


ss=)>']y, —g,(0@)} and determine the value of @ that minimizes SS, say 65,5. This can be 


i=l 


obtained from the solution of aD 0. By putting 6 


ors into SS one gets an estimated sum of squares 


SSE = » ly - 2, (Oors)} which can be used for estimating dispersion. 


i=l 


EX 35 (y, isa are iid with Y, ~ Poisson(A) . Find the OLS estimator A iok : 


ss= 5 [y, -af = Sy 2, -4]= 2Y fF -4]=0 Yy,-m=05 
i=l i=l i=l 


3 


- «ie _1 _ . ~ ly _ 
Here we notice that E(Ag, 5) = — DA) = a0 -A = A Unbiased.) and V(O,,5) = reat) = 


i=] 


oe 
n n 


EX 36 (y, i where Y, are independent with E(Y,) = fx, and V(Y,) = o” .This model is often called ‘Linear 
Regression trough the Origin with constant variance’ Here each ~; is fixed while Y, is random. Find the OLS 


estimator of (2. 


2 xY; 
n dSSs n n n rf — 
SS = DL - Bx] =e “dp = Z>- x IY, fx; |= 0> py x _ pay = Bors =F 
i=l i=l i=l i=l pA 
x; 
i=l 
ne | n 1 n ; 
E\Bos => Si x£(%) =— >a: - Bx; = B (Unbiased) 
ya i=l > xe i=l 
i=l i=l 
: 1 n P 1 n r ‘5 o- 
V Bans =—— be V(Y;) er ys ‘oo =— . Notice that the variance is small, i.e. 
i=l 


[Se : (7) 2 


the precision of the estimator is high, if the x, -values are large. In practice this means that if we e.g. want to estimate 
the relation between Y, = Fuel consumption and x; = Speed, we should measure Fuel consumption when Speed 
is high. 


When E(¥,) = g;(9,,..-,9,), a function of several parameters, we put SS = Vy, -2, (0. OF: By 


i=] 
solving the equations oes 0, Le 0 we get the OLS estimators of the parameters. 


do, dé, 
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4.3.2 Method of Moments 


This method was suggested by the statistician Karl Pearson in the late 1800s. The approach is to equal 
the sample moments Y, S”,...to the corresponding moments in the population E(Y) = g,(0,,9,...); 
V(Y) = g,(0,,0,...),... and solve for the parameters. The method has several deficiencies, but moment 
estimates can be used when more ingenious methods require initial values in order to get iterative 


solutions. An example of this is given in EX 44. 


EX 37 (Y, a are iid and Y, ~ Poisson(A).Here E(Y,;) =A =V(Y,). 


Obviously om =Y.We might have used I ieee = §° , but this is less appropriate since * has larger variance 
than Y . 


EX 38 (y. ae are iid and Y, ~ Gamma(/,k).Here E(Y) =k/AandV(Y) = kik. 
k/A=Y (\) 
Put 
k/?? =S? (2) 
yields k, =Y?/S?. 


from which we getk = AY = A’S* => A,,,, = Y /S” .The latter inserted into (1) 
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4.3.3 Method of Best Linear Unbiased Estimator (BLUE) 


The method has an unclear origin but seems to have been in use since the early 1900s. The approach is 

simply to put 7, = Vay, and determine the constants @; such that 7, is unbiased and has minimum 
i=l 

variance. This problem belongs to the field ‘minimization under restrictions. In the examples below we 


show how solutions can be obtained in a simple way by using Lagrange’s multiplier 4 . 


EX 39 Y, ae are independent with E(Y,;) = 8 and V(Y,) = V, . Find the BLUE of @. 


T, = yay, => E(T,) = > aE) - 0S a, = (Put)=0=> Ya, =lor Ya, -1=0 (i 
i=l i=l i=l i=l 


i=l 


n n 


We now minimize 0 = V(T,) + Zp | ] = yay + Zp (es ] with respect to a,. 
i=l 1 


i=l i i= 


© oO S4,e— ay (i 


i i i 


n n yh 1 
Putting this into (i) gives Ya; = ae =1>1'= Ta which inserted into (ii) gives 
i=l 


i=l i 1 
= 
oa Se 
a= n 1 ~ on SO, BLUE — A 
Ma Dil, Vy, 
i=l V; i=l i=l 
The variance is V Oorur) = =) V,= ee 


; : 
: il 1/V, 
pg : 


Notice that if all variances are equal, V, =V ,then Gs = Y . Otherwise BLUE estimates can’t be computed in 
practice without further assumptions about the variances. 
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EX 40 (Y, is are independent with E(Y,) = Bx, and V(Y,) = ox) for some p. 


This is the same situation as in EX 36 with the exception that VY) is no longer constant, but changes with Kee Find 


the BLUE of # . 
T,, = via, => E(T,)= Yi ajE,) =p ax, =(Put)=@=> yas ey nee 
i=l i=l i=l i=l 
V(T,) = >.a?V(Y,) => a7o?x? = 0? a? x? . 
el i=l i=l 


O=V(T,)+ ban 1 7 oy a + bar 1 ; a =20°a,x? + Ax, =0> 
al i=l i=l 


Ane” : 
a; =- = =A'x e 
20 
n n 1 
(ii) into (i) gives yA ‘SK = ae a =|1>)'= = , which inserted into (ii) gives 
i=l i=l 2-P 
i 
i=l 
n 
l-p 
xlP pe: Y, 
a; = —+—. BLUE for f is thus Bg, ye = ———— 
Sar Sa 
i=l i=l 
7 1 n i 9 1 n a 2 o 
V (Baur) = (x r) V(Y,)= (x r) o xP = 


(ii) 


Special cases 


Dam, 

ivi 2 
A = A or 
p=O0soV(Y,) = ao’: Pour = + with V (Bae) = — 


n 


> ae : 


, C yY. ‘ o 
p=1,soV(Y,) = ome Beir = — = with V Bork) = 


n 


Bs ‘ A oO 
p=2,soV(Y;)= a Berur = a with V(Berug) = oe 


This illustrates that estimators of the same parameter can differ very much, depending on which assumptions are 
made about the data structure. In practice it is therefore important that such structures are investigated before the 


estimation is done. 
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4.3.4 Method of Maximum Likelihood (ML) 


The method was developed by the English statistician R.A. Fisher in a series of papers published during 
the period 1912-1922. The idea is to determine the value of 6 that maximizes the likelihood L(@), thereby 
finding ‘the most likely value of the parameter given the outcomes y,,..., y, - If L(@) > 0, the likelihood 
has maximum for the same value as In L(0) (cf. Ch. 2.3.3). Since the latter function is more convenient 


to deal with, the ML estimator 6,,, can be found by solving the likelihood equations 


dink _ 0 
dinL do, 
= 0 for one parameter, or : for many parameters. 
do d\inL 
do 


P 


Some properties of ML estimators: 


- The likelihood equations give the ML estimators if the conditions in (20) Ch. 4.2 holds. 

- Let g(@) be a continuous function of @. Then the ML estimator of g(@) is gun) - 

- MLestimators are seldom unbiased for finite n, but the bias can often easily be removed. 

- Ifa sufficient statistic 7, for O exists, then @,,, is a function of T,. 

- MLestimators are consistent. 

- In large samples (n — oo ) V (81) = 1/1(@), so ML estimators are MVEs in large samples. 
6, -9 


” f/7(6) 


- Asn>o P ¥Z ~ N(0,l). 


EXPERIENCE THE POWER OF 


FULL ENGAGEMENT... 
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RUN EASIER... ~~ 
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ML estimators can be MVEs, also in small samples as will be demonstrated in the Supplementary 


Exercises of this Chapter. 


EX 41 (Y, i are iid with Y, ~ Exponential(A) . Show that the ML estimator of / is biased and correct it for bias. 
Determine the variance of the corrected estimator and compare it with the CR limit. 


n ay n n 
LA)=[ [se =2e  >InL(A)=nIn(4)-A>Dy, > ae = a > =0> 
i=l i=l 
n 1 * n 1 
= = —. The corresponding estimator is Ay, = 7 = a 
y 
De >» 
i=l 


i=l 


A ML 


n 
To compute expectation and variance, notice that pes ~ Gamma(A,n) and use the expression in Ch.2.2.2(2) with 
r=-—landk =n: i=l 


E(u) =n-A = = cf. Ch. 235) =n-A ala ae! A ..This is not unbiased, but the 
T(n (n-)IT(m-1)) (n-]) 
* -l) » —1 
bias can easily be removed by considering’ yy, = (aus Au, = _— 
n 
Y. 


i 
i=l 


Hig )=o0-07|[ S| ow 1) ( -9(S) _ F? [Sx : 


i=l 


pl@-2)_(Te-) age T(n-2) _f_Ta-)_))_ 
T(n) T(n) (n—-I(n-2).(n—-2) \(a-DIE(n-D) 

i eg 
(n-1)(n-2) (n-1)? (n—-1)?(n—-2) 


2 2 


- A 
Thus, V(A'y7 ) = ( which is larger than the C-R limit — . It can be shown that the C-R limit can’t be attained 
n—- n 


for any estimator. In fact, Zils an unbiased MVE. 
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EX 42 Let Y(t) be a Poisson process of rate 2. 


a) Find an unbiased estimator of 2 and compute the variance of the estimator. Determine the C-R limit and 
compare this with the variance. 


‘ 
In this case the Likelihood is of a different form: L(A) = P(Y(t) = y) = ae > In(L(A)) = 
y! 


Yo 


din(L(A)) mes -t=0> Aig = —~. (Notice here that we use the 
da A t 


y 


notation for an estimator rather than the estimate — .) 
t 


y(In(2) + In(t))— In(y!) — At > 


A 1 1 : 
E(w) = “BLY (t))= a =A (Unbiased), V(A4,) = 4+y(r(@) = : At = : >Oast>o. 
t t 


din(L(A)) Yo d* In(L(A Ye 1 1 f 
ate ) “ ans ae )) D2 12) == BU) =p M = Ths, the variance 


equals the C-R limit 1/ /() and we conclude that ee is an unbiased MVE. 


b) Find the ML estimator of P(Y(t) = 0) =e 7. Compute an estimate of the latter when t = 0.5 and we have 


observed that Y(5) =10. 


The ML estimator is (e-‘| which gives the estimate @ 5 =e! ~ 0.37. The latter estimator is in fact biased, 


but it can be shown that the bias tends to zero with increasing t. 
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EX 43 Let Val, ) ~ Multinomial(n, p,,..., p,,). Determine the ML estimators of ),,..., PD, - 


k-1 Yk 
n! 
» , where C = ———— 
j=l 


fae L= Ch pt => ph 91 ign 
Vie Vee 


L 


We present two solutions, one without and one with the use of Lagrange’s multiplier. Without Lagrange’s multiplier: 


k-l k-1 Yk k-1 k-1 
lad,= InC+In] |p?’ + mS = InC+)>°y, Inp; + y; mS 


i=l i=] i=] 


d|nL —1 ; 
= ~ 04%: ty; a aa Vk =0> p,=y,P4,i= geek (i) 
dp; Pi 1 Pi Px Ve 
— ZU Pi 
i=l 
; 7, Pe Pe, _P y 
Since yar = 1 we get yy; <i =LS"y; aft y=1S p= — and this inserted into (i) gives 
i=l m= Ve OV Yk e 
n _ Si 


With Lagrange’s multiplier: 


k k 
InZ=InC+ yy In p; is to be maximized subject to the condition » P, =1 (ii). 


i=l i=] 


k k 
PutO=InC + >a In p; “AY a ] > “e =0+7#44=05 D; =—Ay; =A'y; . (iii) Putting this 
i=l i=l Pi Pi 


k 
into (ii) gives NY y; = A'n =1=> A'=1/Nn , which inserted into (iii) gives D; =, /n. 

i=l 
The difficulty in this example arises from the fact that there are just k-1 genuine (linearly independent) parameters 
to estimate. 
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Ex 44 (Y,)" areiid where Y, ~ Gamma(J,k) 


a) Determine the ML estimators of 2 and p. 


Ark n k=l ayy 
From EX 33 e) the likelihood is , = Vj ers 
P(A) ia 


dinL nk & 


InL =nk-Ind—nln0(k)+(k-l) ny, -2)) y, => ah Be a 
i=l 


i=l 


nk x ‘ fe k\) dinT(k) x ‘ 
ails >A, = (and nin| — |—n———+ Y Iny, = (ii) 
Fi 27: ML y y dk 2, Jj 
Iny, 
ee a amr) 4 : 
Rearranging the terms in equation (ii) gives Ink - Tk =Iny (iii) 
n 


By first solving (iii) for i and then putting this into (i) yields the solutions. However, (iii) has to be solved iteratively. 
How this can be done is illustrated in b) below. 


b) From a sample of n= 100 observations the following quantities are calculated: 


> y; = 223.56, }) 7 = 619.0525, In y, = 66.3803 


Compute the ML estimates of 2 and k. 


The right hand side of (iii) is In(2.2356)-0.6638=0.1407. With g(x) = Ink we want to determine the value 


dInT(k) 
dk 


_ dink) 
dk 


of k such that g(k) = 0.1407 . The function is well known in mathematics and is called the digamma 


function. We can thus plot g(k) against k to find a solution of k. 


Some help in the search for solution is to use the estimate obtained by the Method of Moments. In EX 38 it was 
a ~ /s7.Now, c= (619.0525 = (223.56)” /100)(100 —1) =1.2046 and y = 2.2356,s0 


I hie = 4.15. It is felt that a search for k in the interval [3.00, 5.00] should suffice. 


shown that k 


The following program code (written in SAS) can be used to find k. 
data a; 
do k=3 to 5 by 0.01; g=log(k)-digamma(k); output; end; 


proc print; var k g; run; 


The solution is A ,,, = 3.71 and putting this into (i) finally gives A,,, = 1.69. 
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4.4 Final words 


In this chapter we have only considered estimation of parameters and function of parameters. E.g. we can 
estimate p(y) = 2” / yle* by plugging in an estimate of 1. It is also possible to estimate p(y), f(y), F(y) 


etc. directly from data without model assumptions, but such procedures are beyond the scope of this book. 


The information inequality in (22) seems to have been first discovered by Aitken and Silverstone in 1942 
during the second World War. During the 1920s Fisher showed that /’(6,,, )=1/1(0) in large samples. 


Consider the estimation of jin the normal distribution. The estimator Y is unbiased for “and has 
variance o” /n. An alternative estimator is the sample median, say m. This is also unbiased and has 
variance zo” / 2n inlarge samples (Rao (1965), p356). The relative efficiencyis V(Y)/V(m) =2/z ~ 0.63, 
and from this it seems obvious that the sample mean is to be preferred. However, there may be other 
aspects to take account of. In some cases the median is easier to use or can be computed more rapidly. 
As an example, consider estimation of the mean life length of rats that have been exposed to some drug. 
If we use the sample mean we have to wait until all rats have died (which may take years). By using the 


sample median we only have to wait until half of the rats have died. 


In some text books one can find the concepts Best Asymptotic Normal (BAN) estimator and Consistent 
Asymptotic Normal (CAN) estimator. The ML estimator is both BAN and CAN. 
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Supplementary Exercises, Ch. 4 


EX 45 (Y, )"", are iid with Y, ~ Uniform|0,}]. 


a) Findan unbiased estimator of b based on the largest observation Y/,,, and determine the variance of the estimator. 

b) Show that the methods of OLS and Moment give identical unbiased estimators and determine the variance of 
the estimators. 

c) Compare the relative efficiency of the estimators in a) and b). 

d) The waiting times at a red traffic light were recorded 10 times and gave the following values (in seconds): 8, 13, 
16, 12, 46, 4, 22, 17, 34, 28. Use the data to estimate the time for each red light period. [Chose any estimator 
you want. Which one is most reliable?] 


EX 46 (Y, ie are independent with Y, ~ Binomial(n,, p). 


a) Find unbiased estimators of p by using the OLS- and ML methods. Compare the variances of the estimators. 

b) Show that the ML estimator is BLUE, in contrast to the OLS estimator, and is in fact a MVE. 

c) To estimate p = ‘Proportion of students with back/neck pain, a sample of students in three class-rooms were 
taken with the following result: 


Room Total number of students Number with back/neck pain 
1 30 1 
2 25 3 
3 35 2 


Compute the OLS- and ML estimates of p. 


EX 47(Y,)"" are iid with Y, ~ Geometric(p) . 

a) Find the ML estimator of p. 

b) Sometimes it is practical to use sequentially collected data, rather than data with a fixed sample size n. Consider 
the following (fictive) data collected from a stream of students passing by:( 0,0,0,1,1,0,0,1), where ‘1’ indicate 
that the student visited a pub last night and ‘0’ that the student did not visit a pub. Estimate the proportion of 
students who visited a pub last night. 


EX 48 (Y, a are independent with Y, ~ N(x,,07) 


a) Find the ML estimator of £ . Show that it is BLUE and determine the distribution of the estimator. 
b) Find the ML estimator of GO” and show that it is biased. Remove the bias and determine the distribution of the 
corrected estimator. 
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EX 49 In order to estimate mean // and variance o*ina normally distributed population, one takes independent 
samples from different regions. Determine the BLUE of the two parameters from the following data: 


Region 1 i* k 

j IN 
Sample size ny, ny 
Sample mean Y, ry y, 


IN 


ry SS 


Sample variance 


EX 50 Let (Y, 1,5; ) ~ Multinomial(n, p,2p, (l- 3p)) Find the ML estimator of p and check whether it is MVU. 
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5 Interval estimation 


In the previous chapter we considered the estimation of an unknown population parameter 0 at a single 
point. In this chapter we will show how it is possible to construct intervals that enclose 6 with a certain 
degree of confidence. This approach is more informative than that of Ch. 4 since it does not only tell 


us about the location of the parameter value, but also how confident we should be with the estimate. 


5.1 Concepts 


A confidence interval (CI) is a pair of statistics (0, ,0,,),0, <@,, that encloses @ with probability 
1—a, the latter being termed the confidence coefficient or confidence level. The use of 1—a@ is somewhat 


confusing, but its origin will be evident in Ch. 6. 
Some properties of a CI: 


- 6, and 6,, are both functions of a random sample Y, ... Y, and therefore the location and 
length of the CI will vary randomly. 

- ‘There is no guarantee that a specific CI, which is a function of y, ... y,, contains the true value 
of 0. All we know is that a sequence of specific Cls will contain 6 in 100(1—a)% of all cases 
in the long run. 

- It is desirable that the CI is short, in order to be informative, and also that 1—@ is high, so 
that the CI is reliable. However these two aspects are incompatible. By increasing |—@ we are 
also increasing the length of the CI. 

- ‘There is no golden rule to solve the conflict between length and level of a CI. In practice it 
is up to the statistician to use common sense in this matter. If the sample size is small and if 
population variance V(Y) is large, then one should be prepared to decrease the confidence 


level rather than stick to the conventional level of 0.95. 


In Ch. 4 there were some requirements on an estimator, and especially that it should be an unbiased 
MVE. Here we define a ‘best’ interval estimator by the requirement that £ (6, -6,) is minimum for 


given n and l-a. 


An important method for finding a CI for 0 is to find a pivotal statistic, i.e. a statistic with the following 


properties: 


1) It is a function of Y, ... Y, and 0. 
2) Its probability distribution does not depend of 0. 


An example of a pivotal statistic is §?(n—1)/o07 ~ y*(n—1) (Cf. EX 18). The subsequent examples will 


show how the pivotal method works. 
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5,2 Cls in small samples by means of pivotal statistics 


EX 51 LY, a are iid where Y, ~ N(u,0°). 


a) Construct 95% Cls for 44 and ao” when n= 10. 
b) Determine the specific Cls when y = 80 and S? =81. 


a) Cl for LL. 
From Ch. 3.1(17) , (¥-#) = olvn ~ T(n-—1).This quantity is pivotal. Let C be some constant, then 
Sivn — sidn — [x?(n-1) 
aldn (n-1) 


since the T- distribution is symmetric around zero 


Y 7-c > SD) 
1 a=H c<2-Kec}- AY -cRcueFoc5| (i) 
oe P(-C <T(n=1)<C) (ii) 


In (i) we have simply rearranged the inequality so that 4/ is centered. (ii) is used to determine C in which case we 
must know the confidence level 1— @ and n. 


With 1—a@ = 0.95 we get: P(- C<T(9)< C) = 0.95 > C =2.262, obtained from tables of the T- distribution. The 


95% Cl for £/ is thus [7 = 2.262 r+ 2.2629 ) 
10 


v10 
Notice that the interval can be constructed before the sample is taken. 


Cl foro” 


From EX 18, S7 ~ 


S?(n-1 
x? (n—1) which is not pivotal, but — ~ y?(n—1) is. The chi-square 


o 
(n—I) 


distribution is not symmetric so we consider two constants a and b such that 


S’(n-1)  . S?(n-1) 
ja =a SGP <p) it ba 
(oy 
Pla< 7?(n-1) <b) 


With 1—a@ = 0.95 the area below a under the chi-square density is 0.025 and the area above b is 0.025. Thus, 
98? 98? 


Pla < 729) <b)=0.95 = a =2.7004,b = 19.0228 . The 95% Cl for is 
19.0228 2.7004 


b) 


9 
The lower and upper limits in the specific Cl for 44 are 80 + 2.262 —— = 80+ 6.4, so the 95% Cl is (73.6,86.4). 


10 


a, or (38.3, 270.0). This interval is very wide, as it should be since 


19.0228’ 2.7004 
variance is a squared quantity, such as dollar? or kg. It would be wise to use a confidence level lower than 95% in 


The specific Cl for ois 


this case. 
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EX 52(Y,)"", are iid with Y, ~ Uniform[0, 5] 


a) Construct a 95% Cl for b based on the largest observation in the sample. 
b) Determine the specific 95% Cl from the following data: 28, 19, 31, 12, 15. 


a) 
n 
From (8) in Ch. 3.1the cdf of the largest observation Y,)is Fy (y) = (2) , 0s ysb.Xy, is 
(n) 


not pivotal, but we may try to find a pivotal statistic by the following device. Consider CY ny with cdf 
n 
y a , 
Fey, (y)= P(CY,,, < y)= P(Y,,y < y/C)= 7 (v/C)) = (=) . This is not dependent on bif C =1/b. 


Thus, Y, 


(n / Dis pivotal with cdf Fy ,(v)=y",0OSy sb. 


We now proceed as in EX 51. 


1 
PY.) lb< ¢,)= ¢, =0025 = c¢, = 0.025" 
1 


PY, /b<c,)= 0% =0.975 > c, = 0.975” 


0.95 = P(c, <¥,/b<c,)> 


It remains to put the parameter 6 in the center, 


I? 1 


Y Y Y y 
Plc, <Viy/b<cy ) = i we epe =| —© @ | is a 95% Cl for 6. 
0.975" 0.025" 


b) 


Here n= Sand Y,, =31=>| =, ] = (31.2, 64.8) 


1? 


0.9755 0.0255 
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Exercises in Statistical Inference 
with detailed solutions Interval estimation 


EX 53 Given two independent sets of iid variables (x igs and (Y, lke where X, ~N(u1y,04) and 
2 
Y,~N(uy, oy) ; 


2 2 
a) Construct a Cl for the ratio O » /Oy. 3 
b) Construct a Cl for the difference (ly — Ly )assuming that Oy =Oy =O . 


a) 


a2 2 a2 2 
Consider the two unbiased estimators 0» = Sy and Oy = Sy .These are independent since they are based on 
independent sets of variables. From EX 18 in Ch. 3.1 we know that 


2 2 
o o 
S2. ~—* _ y(n, -l)and S2 ~ —_ 7’ (n, -) => 
xX Gt (ny —l)and Sy Gb” (ny —1) 


Stok x (ay—-Dhny-)) 0% 


-F (ny —1,nmy —1) [Cf. the F- distribution in Ch. 3.1.] 
Sp oy 7 my—Dkny-D op 


2 92 
o 
The latter quantity is not pivotal, but —+ . —* ~ F(ny —-1,ny — Nis. 
Oy vy 
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EX 53 (Continued) 
The F- distribution is not symmetric so we choose two constants C, and C, as limits in the following inequality 


2 2 2 
ag, S- P ex gor Sx 
l-a=P GS a exe |= GSe Oo, eS; (1) 
a Plc, < F(ny -1,ny -1)<c,) (ii) 


In (i) we have simply centered the variance ratio. The expression in (ii) is used to determine the two constants 
Cy and C,. However, this may be cumbersome, especially since F- tables often are incomplete. We devote a few 
lines to show how this can be done. 


Let n, = 25,n, =10and1—a@ =0.95 . We want to determine C; and C, so that 
(ii) holds. Since P(F(24,9) > C,)= 0.025 we obtain c, = 3.61 by using Table 7 in 
Wackerly et al. It is harder to find the value of C, from the table. The value of C, giving 


P(F (24,9) > c,) = 0.975 or P(F (24,9) < c, )= 0.025 is not shown. Instead we use the fact that [See Ch. 3.1 (12).] 


PAF 249) <0,)= FL : a }=7[ F0.20> | 0.025 => +. =2.70 >c, = 0.37. 
C 


< 
F(9,24) C, 
2 2 2 
In this case the 95% Cl for 2% is a : 
Gs 3.61-S; 0.37-S; 
b) 


X~M(uy,0? /ny}¥ ~N(uy,0? /ny)=> (¥-F)~ Mu, — fy, 07 (I/ny +1/ny)),since a linear function of 


normally distributed variables is itself normally distributed (Cf. Ch. 2.2.2). Thus, 

(X -Y)—-(Hy - by) 
2 

Jo (l/ny +1/ny) 


of o?. 


~ N(0,]1) . This is pivotal, but it can’t be used since co” is unknown. We need an estimator 
> p 


= (ny -1)S¢ +(ny -1)S 


ny —l+ny—-1 


2 
From EX 49 it follows that ¢? Y is BLUE for 2 . Since (ny = Is; a oy (ny -1) and 


(ny -1)S? ~ X (n, — 1) it follows that 


gin o?(77(ny -1)+ 77 (ny -)) (ny +ny —2) 


[Cf. Ch. 3.1 (8)] 
Ay —l+ny—1 (Hy iy =2) 


The following statistic is pivotal and useful 
(X-Y)-(ny — Hy) 
(X-F)- (ay —my)_ or /ny +1/ny) NOD) 
\67U/ny +1/ny) 46? (I ny +1/ny) [on 
Jo? /ny +1/ny) (ny +ny —2) 


Proceeding in the same way as in EX 51 we finally obtain the lower and upper Cl limits as 


X-Y+£C JG? (1/ny +1/ny) , where Cis determined from the T(ny + ny — 2) -distribution. 


~T(ny +ny —2) 
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Exercises in Statistical Inference 
with detailed solutions Interval estimation 


Comment to EX 53 The results in this exercise are crucial for comparing two means. By making a CI 
for (y — My) the main interest is whether the CI encloses zero. If a 95% CI, or a CI of higher level, 
does not contain zero it is customary to conclude that the two means are significally different. This way 
of claiming statistical significance is different from another one based on statistical hypothesis testing 
that is considered in Ch. 6. 


Notice that the first step was to make a CI for the variance ratio a; /o;. If the latter encloses 1, the 
customary conclusion is that there is no significant difference between the variances, which therefore 
could be set equal. The CI for (41y — Hy) was then constructed under this assumptions. On the other 
hand, if the CI for the variance ratio does not enclose 1 we can't claim that variances are equal and a 
different approach is required. This is called the Behrens-Fisher’s problem for which several approximate 
solutions have been suggested. The latter are however beyond the level of this book. Most statistical 


software present solutions both with and without the assumption of equal variances. 


Deloitte. 


Discover the truth at www.deloitte.ca/careers © Deloitte & Touche LLP and affiliated entities. 


69 Click on the ad to read more 


Download free eBooks at bookboon.com 


EX 54 te sia are independent with Y, ~ N(fx,, ao”). Construct Cls for 8 and 0”. 


I 


Cl for B 


‘ ‘ x,Y, Bo 
From EX 48 we know that fy, = Baur = 2 — and that eS) ed ee N(0,1). The latter 
yi fo pa 
statistic is pivotal but it can’t be used since O° is unknown. As an unbiased estimator of a” we take 
A 2 
Ga + & — ByyX;) a 
(n-1) (n-1) 


A pivotal statistic that is useful is 


x (n—1) ICf. EX 48], where oe and G° are independent. 


Bu —B 
Ba -B Ve NOW 
(2d fee? foe 
Prise Yo) 


ee: eae Bua - B #0 \s P Bin - CY? ox? <B< By, +C{e*7 3 J 
(a i> xe P(-C <T(n-1)<C) 


The Cl for f is Bun tcc pas where C is determined from the 7'( — 1) - distribution. To illustrate the 
computation of C, let n = 10 and assume that we want a 90% Cl for B . Since the area under the T - density 
between -C and C is 0.90, the area above C is 0.05. (Most tables today show areas above C.) From the tables we get 
C= 1.833. 


P= 1) the latter distribution being symmetric around zero. Thus 


Cl for oo 


6° (n-1) 


G’ is not pivotal, but 5 ~ xv (1 — 1) is. Therefore, and since the chi-square distribution is not symmetric, 
Oo 


there are two constants a and b to be determined. 


5 b a 


6? (n-1) <0) {eed eg” <2 0-)) 
. Pla< 72(n-1) <b) 


Ia = Ha 


a2 a2 
—] —] 
Here a and b are determined as in EX 51. The Cl for Oo” is thus (2 “ ca (n a 
a 
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5.3 Approximate Cls in large samples based on Central Limit Theorems 


In Ch. 2.2.2 (a) the Central Limit Theorem (CLT) was stated for a standardized sum of iid variables, 
denoted by Z,, , and for a Poisson process in Ch. 2.2.2 (c), denoted by Z(t) . In Ch. 4.3.4 it was stated that 
standardized ML estimators have asymptotic N(0,1) -distributions. These results can be used to find 
CIs that holds approximately in large samples. Since the CIs only hold approximately it is important to 


understand the meaning of ‘a large sample. Below some examples of Cls derived from CLTs are given. 


EX 55 Y ~ Binomial(n, p) , or equivalently (Y, ie, are iid with Y, ~ Bernoulli(p) . Determine a Cl for p based 
onY=)°Y, anda largen.Putp=Y/n. 


a) Give a justification for the formula ptc [pd — p)y/in that is often found in textbooks. (Sometimes division 
by n -1 is used instead of n.). 


From EX 23c) 


ae ce PoP ec |_fP(e-CYBU- Dyn < p< p+CYPU- BI) 
VA(L- py/n A-C 27, <C) 


where C is determined from tables of the Normal distribution 


b) Show that a more accurate Cl for p is given by the limits 


2p+C?/nty0p+C? in) —4p7+C? /n) 
2(1+C?/n) 


From EX 23b) 


1 al c<—P=P <c}- A (pp)? <c? O=P) (i) 


Jpd=p)/n PLC<Z, £C) (ii) 


By solving the inequality in (i) for p it can be shown that p is located between the limits stated in b). C in (ii) is 
obtained from the normal distribution. The Cl in b) is more accurate since the approach to normality goes much 
faster for Z, than for Z, . Notice that the approach to normality for ZL, requires not only convergence in distribution, 
but also convergence in probability as was shown in EX 23 c). 


Comment to EX 55 It is important to know the difference between the expressions given in EX 55 a) 
and b). The former is often stated in textbooks as being a result of the CLT and sometimes lowest sample 
sizes of 30-50 are advocated. This may hold for the validity of the expression in b), but definitely not 


for the expression in a). 


In 1963 an interesting relation was found between the Binomial - and F distributions by G.H. Jowett. 
By using this it is possible to obtain a CI for p that holds for any sample size. The latter may be hard to 
find in text books at master level in statistics, but yet we present it here since the result is very useful. 
(Cf. Casella & Berger 1990, p. 499.) 
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Let F',5(f,, f>) be the 97.5% percentile of the F distribution, ie. PUF(f,, f;) > Fo7s( fy f>)) = 0.025. 
Then a 95% Cl for p is given by 


| y (¥ + DF os 2¥ +1) 200-Y) (23) 


Y¥+(n—-Y +))Fo75(2(n—-Y +1) 2n) n-Y +(¥ + I) Fo75(2(Y +) 2n-Y) 


lf Y = 0 then the lower limit is 0 and if Y = 7 then the upper limit is 1. 


The expression in (23) gives CIs that are conservative in the sense that they give CIs with a confidence 
level of at least 95%. For simplicity a 95% CI was considered. If a 99% CI was required we would instead 


search for 99.5% percentiles in the F-distribution. 


EX 56 Use the expressions for a Cl in (21) and in EX 55 a) and b) to calculate 95% Cls for p in the two cases 
(y = 2,n = 20) and(y =10,n =100). 


To use (23) we have to determine the 97.5% percentiles of the F-distribution. This can be a problem since F-tables 
are often incomplete and percentiles are only shown for a few degrees of freedom. The best one can do is to use 
statistical software, such as SAS or SPSS, to find the percentiles. In worst case one may be forced to use linear 
interpolation. 


In the case (Vy = 2,n = 20) we find F,,, (38,4) = 8.4191 and F,; (6,36) = 2.7846 . (These values were 
obtained by using the function finv(0.975, f,, fz) in SAS.) Similarly, in the case (y = 10,7 = 100) we get 
Fy75 (182,20) = 2.1326 and F’,,; (22,180) = 1.7503 . 


In both cases we get the same point estimate of p, 2/20=10/100 = 10%, but the Cls are different: 


Cl(%) forp 


Expression in: (23) 55 b) 55 a) 
(y = 2,n = 20) (1.2, 31.7) (2.8, 30.1) (-3.1, 23.1) 
(y =10,n =100) (4.9, 17.6) (5.5, 17.4) (4.1, 15.7) 


The Cls based on (23) are certainly wider, but they are more reliable since they are conservative as mentioned above. 
Notice that the expression in 55 a) can result in peculiar Cls in small samples. 
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EX 57 Determine a Cl for the rate { in a Poisson process. 


In Ch. 2.2.2 (4)(¢) it was seen that, if Y(£) is a Poisson process of rate J ,then 


Y(t)-At Y(t)/t-A_ p 


Vat alt 


there are some difficulties to obtain an inequality for Z by using this fact. Instead we notice that the statistic 


A= Y(t)/t—4 A, as t —> ©. [This follows from (10) in Ch. 3.2.1 since E(A) = lz) = a =A and 
t t 


Z(t) = >Z ~ N(0,1) ast — 0. This statistic is asymptotically pivotal, but 


V(A)= =V(ro)=4 =" > 0,ast > 0,], Thus, 
t t 
pr ae ‘ 
‘ —Z A-A 
A-A_ Salt ,and from (11) in Ch. 3.2.1 we get —= —? +7 Now, 
Alt i aa vA/t 
—_——> 1 
Alt 


Irons -c< 5a cc). HA-c Alt <A<A+C Ait) 50 A+C A/t are the Cl limits for A, 


VAlt P(-C<Z<C) 


where C is determined from tables over the normal distribution. 


EX 58 (Y, ie are iid variables from an unspecified distribution with mean £/ and variance o° .If nis large a 95% Cl 
for LL is given by 


Y +1.96 


e+ 


(This is perhaps the most cited expression in statistical inference and is found in most elementary text books. 


Sometimes 1.96 is replaced by the figure 2,) 


(0) 


Give a rigorous motivation for the expression! 


2 
From (9b) V(S?) > 0,as.n > 00 => S* —? »0° (Cf.(10) in Ch. 3.3.1) g(S”) = je —*>1 (CF. (11) in 
Oo 


YoH > 47 No) 


Ch. 3.3.1). Thus, Yu _ o/An > 57 ~N(0,)) 
Sidn Sidn Psy 
al/Nn 
You > S a s 
For large n, 0.95= P| —1.96 < <1.96 |= P F196 <u<F 1196-4). 
Sin ) vn vn 


The simple expression above should be used with caution. Especially if the population distribution is heavily skewed 
or has multiple peaks, a very large n would be required. 
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5.4 Some further topics 
5.4.1 Selecting the sample size 


Looking back at the examples of this chapter it is seen that the bounds of a CI are functions of the sample 
size n. This opens the possibility to determine n in advance in such a way that the CI has a stipulated 
length. The problem is that the bounds of a CI are also dependent on the values of one or several statistics 
that not yet have been computed. There is no simple solution to this problem but some guide lines can 
be given when the CI has the structure 6 + cP) . The term C,/V(Q) is called Bound on the Error 


(BE). We consider two cases, a proportion and a mean. 


¢ Bernoulli proportion p in large samples 


The Clis p+C,/ p(1— p)/n,so BE = C,/ p(1- p)/n . (Division by n - 1 instead of n is of minor importance.) 
Values of p can be obtained in several ways: 


- Worst case scenario. Choose p = 1/2. It is easily shown that this value maximizes P(1— p) for 
0 < p<1. The maximal BE now becomes C/ 2Vn . This solution should only be used when 
there is no information whatsoever about Pp. 

- Qualified guess. Here one uses earlier experience to guess the value of ». Notice that the 
function p(1— p) issymmetric around p =1/2s0e.g. 6 = 0.10 gives the same BEas p = 0.90 

- Pilot study. The idea is to take a first small sample (pilot sample) to estimate p. Observations 
from the pilot sample could then be included into the final sample. The approach is appealing 
since it is free from more or less reliable assumptions. A problem is to decide how large the 
pilot sample shall be. One solution is to collect data sequentially and compute estimates ),, 


for increasing n until the estimates have stabilized. Usually this occurs for n less than 20-30. 


After having determined an appropriate value of ? it is instructive to plot BE on the Y-axes against n 
on the X-axis for various choices of C. (Remember that C = 1.645,1.960, 2.575 corresponds to the 
confidence levels 90%, 95% and 99%, respectively.) Alternatively, in the expression for BE above one can 


solve for n, givingn = p(l— p(C/B)’. 


One should be aware that data collection in large samples can be costly. A simple expression for the total 


cost isc, +c-n, where C, is a fixed cost and c is the cost for each sample unit. 
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EX 59 Determine the sample size needed to get a Cl for p with a BE of 0.01 or alternatively 0.025. The Cl levels shall be 
90%, 95% or 99%. 


a) Use the worst case scenario. 
b) Use a pilot sample with the data (Y, me (0,0,1,0,0,0,0,1,0,0,0,0,0,0,1) where P(Y, =1) = p. 


Ina) we use p = 1/2. 


In b) we conclude from the table below that P = 0.20 may be appropriate 


n 10 11 12 13 14 15 


p 20 | 18. | 17 | 15 | 14 | .20 
byi/n 
i=l 


The following table illustrates how the sample size n differs between the two approaches in a) and b): 


a) p=05 b) p=0.2 
Cl level BE n Cl level BE n 
90% 0.01 6765 90% 0.01 4330 
95% mt 9604 95% ci 6147 
99% “ 16577 99% 7 10609 
90% 0.025 1082 90% 0.025 693 
95% : 1537 95% - 983 
99% . 2652 99% . 1697 


It is seen that the sample size increases with increasing Cl level and decreases with increasing length of the Cl. The 
approach in a) leads to unnecessary large samples compared with the approach in b). 


« Population mean yin large samples 


In EX 58 it was shown that the limits Y+C-S/Jn gives a CI for yy in large samples provided that the 
observations are iid. The Bound on the Error is BE =C-S/Vn from which n=C*S*/ BE”. In the 


latter expression S can be determined in at least two ways. 


- ‘Empirical rule’. Replace S’ by the true variance o*. Since 99% of the observations are found 
within the variation limits “4+2.580, the range of y-values is roughly 2:2.580 ~ 5.20. 
From this we get S » o = range/5.20 . (Sometimes the figure 2.8 is replaced by 1.96 =2, 


corresponding to 95% variation limits, which gives S ~ range/4o.) 


This approach has several drawbacks. There is a great amount of arbitrariness in the choice of 
coefficient, 2.58 or 2, and sometimes even 3. As a consequence there will be large differences 
in the choice of n. Furthermore, in many cases it can be hard to identify the range of possible 


y-values. 
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Exercises in Statistical Inference 
with detailed solutions Interval estimation 


- Pilot study. As in the case with a Bernoulli proportion, we may take a first small sample to 
obtain a likely value of S?. Data are collected sequentially until the value of S’ has stabilized. 


The calculations can be performed in any of the following ways. 


For n> 5, say: Sy and yy =i = ba -Ooyv) in} (n - 1) or from the recursive 


i=l i=l i=l i=l 


Yn +MY (22085. asi = Tn) 
relations y,, = aT = uae e ) . (The latter relations are found 
nt n n+ 


in Casella & Berger, p. 244.) 
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EX 60 In order to construct a 95% Cl for the mean Area of Optic Disc (AOD) in a group of children, one wants to 
determine the sample size n that is needed to get a Bound on the Error (BE) of about 0.10 to 0.20 
(mm_’). In a pilot study the following values were obtained sequentially. 


AOD 2.39 3.33 2.12 1.90 2.66 2.53 2.30 2.98 2.59 2.70 


n 11 12 13 14 15 


AOD 2.20 2.77 1.86 2.72 3.28 


Use the data to first calculate a value of S? and then suggest a proper sample size. 


The sequentially calculated values of S? are, starting from n = 7. 


n 7 8 9 10 11 12 13 14 15 


S* | 021 | 021 | 019 | 017 | 0.16 | 015 | 018 | 017 | 0.20 


An appropriate value seems to be S’” = 0.20 that gives 1 = 1.96” -0.20/ BE”. The desired sample sizes are thus 
n=77 for BE =0.10 and n = 20 for BE = 0.20. 


5.4.2 Cl for a function of a parameter 


Given a CI for 0, say 6, <O< 6,, , it is possible to make a CI for a function of 0, g(@), provided that the 


latter is monotonous (decreasing or increasing). The approach is illustrated in the following examples. 


EX 61 (Y, is are iid where Y, ~ Exponential(A). 


n n 
a) Determine a 95% Cl for J based on the statistic » Y, .Compute the Cl limits when n=50 and > y; = 65.0. 


i=l i=l 


Compute the corresponding Cl limits for the survival function P(Y > y) = eo 


Y, ~ Gamma(A,1) => [Ch. 3.1] => Sy, ~ Gamma(A,n) = [Ch..2.2.2] > 

i=l 
7 2 _ P 2 a u 
24>°Y, ~ y~ (2n).Thus, a9s= Af a<2a5y, <)- yy < 37 
i=l 


i=1 2 
Pla < y?(2n) <b) 
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From tables of the Chi square distribution (e.g. in Wackerly et al 2007, pp850-851) we get, with 
ss P(z2(100) > b) = 0.025 = b = 129.56 
’ | Ply? 100) > a)= 0.975 > a= 74.22 


74.22 129.56 
2-65 2-65 


The Cl for J is thus = (0.57, 1.00) . 


b) P(Y > y) =e *” is monotonously decreasing with A. Therefore 


tyy “LY |t follows that the 95% Cl for the survival function is ( er ), 


Ap hg Se" ea Vee 
Notice that the latter Cl will cover the true value in 95% of all cases at one specific value of y. It may be tempting to 
plot the lower and upper limits against y, thereby creating a so called confidence region. The latter will however not 
contain the true values in 95% of all the cases, since we are making several confidence statements simultaneously 


which in turn will reduce the confidence level. This Multiple inference problem is discussed further in Ch. 6.4. 


In Ch. 2.2.1 (3) it was stated that if X(s) and Y(t) are Poisson processes of rates 1, and A, , respectively, 
Aigh 
Ayst+Ayt 
make a CI for the ratio R = A, / 1, . Due to its importance we formulate the solution of the problem as 


then the conditional variable (y(n|x (s)+ Y()= n) ~ Binomial(n, p = ). This can be used to 


a theorem. 


A Cl for the ratio of two Poisson rates R = Ay / Ay can be constructed in the following way: 


a) First, make a Cl for the Binomial proportion p giving (p, ; Py ) ; 


b) ACI for Ris then obtained as} 2 — Py _s —_ Py S$), (24) 
Pte" d=b)2 


This follows easily from the fact that p = — => R(p)= a 2 j * and this is a function that increases 
t+s —p)t 


monotonously from R(0) = 0to infinity as p > 1. 
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EX 62 In the snow-free period April-November there were 85 road accidents on a certain stretch of road and during 
the winter period December—March there were 65 road accidents on the same stretch. Is the rate of road accidents 
significantly higher during the winter period? 


Introduce the notations 


X (8) = Number of accidents in the snow - free period, of rate 2, 
Y(4)= -". winter period, of rate Ay 
We will answer the question about significance by making a 95% Clfor R= Ay /A,. 


If the latter does not cover 1 we draw the conclusion that there is a significant difference. 
The observed proportion of Y(4) /(Y(4) + X(8)) is Pp = 65/(65 + 85) = 0.4333. Since nis 
large we use the expression Pp +1.96,/ p(1— p)/7n for a 95% Cl in EX 55 a). This yields the limits 


0.4333 + 0.0809 or the CI (0.3524,0.5 142) . The expression in (24) finally gives the Cl limits for R: 


R, = _ 0.3524 8 =]: R 0.5142 & = 2.17. Since the latter interval does not cover 1and is in fact 
(1-—0.3524) 4 


¥ (l= 0.5142) 4 
located above 1, the conclusion is that the rate of road accidents is significantly higher in the winter period. 


> 


In Ch. 6 we will consider other ways to claim statistical significance. 


55 Final words 


Verify that you can find the points a and b in the 7° —and F — distributions such that 2.5% of the 


observations are smaller than a and 97.5% of the observations are larger than b. The intervals in the 
examples of Ch. 5 are 95% CIs. Change the confidence levels to 90% and 99% to study the effect on the 


lengths of the CIs. 


Remember the interpretation of a CI. If you repeatedly construct 95% CIs, then in the long run there 


will be 1 interval of 20 that doesn’t cover the true parameter value. 


Notice that proportions around 1/2 require the largest sample size for a given confidence level and Bound 


on the Error. Many people do not agree about this. Therefor you should go through the arguments in 


Ch. 5.4.1, so you can persuade them. 
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Supplementary Exercises, Ch. 5 


EX 63 The following data shows body weight (kg) of 10 males before (X) and after (Y) participating in a training 
program with the purpose to reduce weight. 


Subject 1 2 3 4 5 6 7 8 9 10 
Xx 88.3 94.6 88.4 102.5 | 94.3 79.3 86.3 96.9 88.5 101.8 
Y 88.1 93.5 88.5 102.0 | 94.7 78.5 86.1 96.2 88.2 101.1 


a) Does the training program have a significant effect on weight-loss? Answer the question by drawing 
conclusion from a 95% Cl for the average weight-loss. 
[Hint: Just look at the differences within subjects. Don’t use the approach in EX 53. Why?] 

b) When the same training program was used by a population of females it was found that the variance of the 
weight-loss was 0.7. Does the latter value differ significantly from the variance obtained for males? 

c) Give a 95% Cl for the proportion of males that loses weight. Compare the results that are obtained by using 
the expressions in EX 55 a) and in (23). 

d) As expected, the Cl in c) becomes very wide. Consider the sample above as a pilot sample and determine the 
sample size needed to get a 95% Cl with a Bound on the Error that is 0.025. 


EX 64 Data below summarizes measurements of Area of Optic Disk (AOD) in mm? from two samples of children called 
FAS and Control. Children in the FAS (Fetal Alcoholic Syndrome) group had mothers who were high-consumers of 
alcohol during pregnancy. 


FAS Control 
Sample size 22 30 
Mean 2.01 2.55 
Variance 0.3623 0.2305 


Determine a 95% Cl for the difference of mean AOD between the two groups. 


EX 65 Let 4 1a be iid where Y, ~ Exponential(A) . 


a) Determine a 95% Cl for J based on the fact that (7- E(¥)\/ Jv) —? +Z ~ N(0,1). 
b) Compute the expected length of the Cl in a) when 1 = 50. Compare the latter with the expected length of 
the Cl in X 61 a) 


EX 66 During an epidemic a sample of five institutions at a university was randomly selected. These were asked how 
many of their employees who were on the sick-list. The result was 


Institution 1 2 3 4 5 
Sick-listed 4 10 8 2 6 
Total staff 10 42 25 11 12 


Give a 95% Cl for the total proportion sick-listed at the university. 


[Hint: Use the ML estimator in EX 46 together with the CLT.] 
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Exercises in Statistical Inference 
with detailed solutions Interval estimation 


EX 67 The number of bacteria (per cm*) in a certain type of food varies according to a Poisson distribution. In a 
sample of 4 units one obtained the following result 


Unit 1 2 3 4 


Number of bacteria 103 112 91 117 


Determine a 95% Cl for the mean number of bacteria. 


[Hint: Use the asymptotic normality of the Poisson distribution.] 
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6 Hypothesis Testing 


In Ch. 5 we considered one way to claim statistical significance, namely to construct a CI for an unknown 
parameter. In this chapter we will meet another way to claim significance, by setting up hypotheses about 
parameters and to see if these are in accordance with data. There are mainly two ways to do this, the 


p-value approach and the rejection region approach. Both of these are described below. 


6.1 Concepts 
6.1.1 p-value approach 


In the p-value approach a basic hypothesis, called the null hypothesis H,, is formulated about one or 
several parameters. In the next step a statistic T = T(Y, ... Y,), called a test statistic, is chosen and the 
value taken by T in a specific sample determines whether H, shall be rejected or not. The precise way 


in which this is done is illustrated in the following example. 


EX 68 Let p be the proportion of born boys in a certain population. We want to test the hypothesis H, : pp =1/2. 
To this end we take sample of n born boys and calculate the value of the test statistic p = X /n, where X is the 
number of born boys in the sample. If the value of P deviates ‘very much’ from the value specified by H, we should 
reject H,. But what is the meaning of ‘very much; is e.g. X = 7 out of n= 10 enough? 


For assistance in this matter we calculate the p-value = P(x 2 7|p = 1/2) where it can be assumed that 
X ~ Binomial(10,1/2) . Thus (Cf. Ch. 2.2.1 (2)) 


10 7 3 _, (10 8 2 (10 9 1 (10 10 0 
p-value = 7 (1/2)°(1/2)° + 9g (1/2)°(1/2)° + 9 (1/2)° (1/2) + 10 (1/2) 1/2)" = 
176-(1/2)'° =0.1719. 
The latter is called a one-sided(one-tailed) p-value. But there is nothing a priori that says that a deviation from H, only 
goes in one direction in this case. We should therefor also calculate P(X < 3|p =1/ 2)= 0.1719. (The Binomial pf is 


symmetric for p = 1/2.) 


The two-sided p-value is thus 0.1719+0.1719 =0.34. The latter is the probability of getting observed extreme 
deviations from H, by mere chance, and it is quite large. 


Assume now that we instead have observed X = 70 out of nm =100. Since am P +Z ~ N(0,1),we can 
calculate a two-sided p-value in the following way: 4 pU- p)/n 


p-1/2 , 0.70-1/2 


Beet pate) V1/400 1/400 


) ~ P(Z = 4.00) = 0.00003 = 
P(6 <0.30|p = 1/2) 


Thus, p-value = 2 - 0.00003 = 0.00006 , which is very small. 
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A p-value is thus the probability of obtaining a value on the test statistic as least as extreme as the one 


that is observed, provided that H, holds. Some comments on this. 


A p-value should normally be two-sided. Exceptions are when it is obvious that deviations 
only can go in one direction. If you present one-sided p-values in a paper that you send to a 
scientific journal, it is likely that it will be returned since referees often wants two-sided p-values. 
It is customary to reject H, when the p-value is less than 0.05. Here is some frequently used 


terminology for this: 


0.05<p<0.10 ‘Weak significance’ 

0.01<p<0.05 ‘Significance’ = 

0.001<p<0.01 ‘Strong significance’ a 
p<0.001 ‘Very strong significance’ bad 


The concept of “Weak significance’ can be found in areas such as Psychology and Sociology 
where sometimes sample sizes are small and the p-values become large only for this reason. 
The use of stars, similar to classification of brandy, has been popular in medical studies, but 


should be avoided. It has actually happened that it has been confused with foot notes. 


A p-value expresses the degree of evidence against H, that is found in the present sample and 
nothing else. Hypotheses such as p = 1/2 for the proportion of heads when tossing a coin, or 
mean = 0 for the difference in means between two groups, can strictly speaking be rejected 
without data. (1/2 is not the same as 0.5 or something with more decimals, it is exactly one 
divided by two.) These hypotheses can always be rejected by choosing n sufficiently large. 
Consider a study conducted som years ago of the effect of physical activity upon on the risk 
of getting heart disease. An ‘active’ group consisting of 30 000 subjects and a ‘control’ group 
of 20 000 subjects were followed in time and the proportion of heart diseases were reported in 
each group. In the study a p-value just below 0.05 was obtained for the hypothesis ‘no difference 
between the proportion of heart disease in the two groups, Newspapers reported that it is now 
proved that physical activity has a statistically significant positive effect on the risk of heart 
disease. The author's personal reaction to this as a statistician is that, if such large amount of 
data were needed to get a p-value below 5%, then the true difference must be marginal. 

The p-value concept seems to have been first used by Laplace in the 1770s when studying the 
excess of born boys compared to girls. It was later popularized by R Fisher in the 1920s and he 
invented the term test of significance for this approach. It was later displaced by the rejection 
region approach, to be described in the next section. During the last years the p-value approach 
has regained its leading position. This is probably due to the rapid development of computer 
programs by means of which the computation of p-values is easy, something that wasn’t the 
case 30-50 years ago. Today most statistical soft-ware supply their users with a variety of 
p-values, obtained by using various test statistics and under various assumptions. This in turn 


has increased the need for a higher statistical level of knowledge. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


6.1.2 Rejection region approach 


As before there is a null hypothesis H, and a test statistic T. Now there is furthermore an alternative 
hypothesis H, and a rejection region, RR, such that if T takes a value within RR then H, is rejected and 
H, is accepted. (RR is sometimes called a critical region.) By using the symbol € (belongs to) this can be 
expressed as ‘Reject H, 'T € RR’ . To types of errors can be made in reaching a decision. A type I error 
is made if H,is rejected when H, is true. The probability of this event is denoted @ and it is customary 
to require that a@ < 0.05. A type II error is made if H, is accepted when H, is true. The probability of 
the latter event is denoted fp. 


An important concept is that of a power function, which is the probability of rejecting H,. If @ is the 
parameter that is specified by H,, then the power is Pow(@) = P(T E RR) Under H,:0 = 4, Pow) =a. 
The latter equality is seldom possible to achieve when the test statistic has a discrete distribution and in 
that case it is required that Pow(@) <a. In general the power depends on: (i) @, (ii) the sample size n, 
(iii) the choice of RR and (iv) the choice of test statistic T. The best test statistic is the one that maximizes 
the power for given 6,n and RR. This is often based on the best estimator. (Stuart et al 1999, Ch. 22.36.) 
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EX 69 Consider the test statistic Y=‘Number of born boys’ ~ Binomial(n = 10, p) that is used for testing 
Hy: p=1/2againstH,:p#1/2. 


Compute the power for each of the RRs: (7) {0, 10}, (ii) {0, 1,9, 10}, (iii) {0, 1,2, 8,9, 10} . Suggest a proper RR. 


(i) Pow,(p) = PLY = 0) P(r =10)=| 9 Jpra- ay" + axe - py’ == py +p”. 


(ii) Pow, (p) = P(Y <1)+ P(Y = 9)= Pow, (p)+ P(Y =1)+ P(Y =9)= Pow,(p) + 

10 10 

; pra-py +| 5 pee = Pow,(p) + 10p(1— p)((1= p)* + p*). 

Aer 10 2 8 10 8 2 
(iii) Pow,(p) = P(Y < 2)+ P(Y = 8) = Pow,(p)+ ; p (-p)' + , p Ud-p) = 


Pow,(p)+45p?(1- p)*((1- p)° +p’). 


The three power curves are shown in Figure 1. 


Of special interest is to compute the power under H,, @; = Pow,(p =1/2) i =1,2,3. 


a, = 0.0062, a, = 0.0480, a, = 0.1796 . Since the latter value is larger than 0.05 we can't use the corresponding 
RR. The RR {0, 1, 9, 10} is to be preferred since it has a power that is less than 0.05 under the null hypothesis and it is 
constantly larger than the power in (i). 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


6.2 Methods of finding tests 


In Ch. 4.3 some methods for finding point estimators were presented. We now consider some methods 
that can guide us when testing hypotheses. These are based on the Chi-square principle, the Likelihood- 


ratio (LR) principle and on miscellaneous methods. 


6.2.1 The Chi-square principle 


This requires that data are classified. For measurements on a continuous variable we thus have to create 


classes. How this can be done is illustrated in EX 109 below. We thus have the following data 


Class 1 2 ae k Total 


Observed frequency y, y, Y, Sy, =n 


Hypothetical probability Py P> = DP; YP =] 


Here the hypothetical probability P; is the probability of belonging to class i under H,. Examples of 
such probabilities are given in the examples below. In general the null hypothesis can be formulated 
Hy : p; = p;(9,,4,...), where 6,,0,,... are unknown parameters that need to be estimated (by the ML 
method) giving 6,,6,,.... The Chi-square statistic is 
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oo ae 2 
ws by =n. (0,.0,--|80)) ” 
i=l np;(0,,93,.-.|Hy) 


2D 2 
As n + 0, X “ ——> y*(k —1—a) under H,. a is the number of linearly independent parameters to estimate. 


In practice (25) is used in the following way: Compute the value of X” , giving X3,,. Then calculate the 


p-value P(x? (k-1-a) > Xone )and reject H, if the latter is smaller than 0.05. 


EX 70 Given the following data 


Class 1 2 Total 
Observed frequenc n 
q y iA y, 
Probability 1-p p 1 


Test 1): p=1/2against H,:p¥#1/2. 


There are no parameters to estimate under H.. 
(¥, —n-1/2) : (Y, —n-1/2) 
n-1/2 , n-1/2 

be written Coa 


v1/4n 


x?= gives Kens . p-value =P(y’ (2-1-0)> Xe) _ Notice that _X 7 can also 


Contingency tables (R x C cross tables). 


The following frequency table, often called a 2 x 2 table, is a convenience way to summarize data. 


Factor II 
1 2 Total 
Factor I 1 Vi ae Yi 
2 Y5, Yo) Ys, 
Total ae Yo n 


Here there are two ‘Factors, each divided into two categories. Examples are when the factors are two 
doctors who classify the same n patients as either ‘Healthy’ (1) or ‘Diseased’ (2), or when the political 
opinion (left- or right wing) of n voters is measured at two times (Factor I and II). The table can be 
generalized to a R x C table with R rows and C columns. It can also be extended to more than two factors, 
e.g. when the political opinion of n voters about P >2 parties is measured at T > 2 times. In such a case 


the sample is called a Panel. Notice that all cell-frequencies are random, except for the fixed sample size n. 


Corresponding to the 2 x 2 table of frequencies, there is a 2 x 2 table of probabilities 
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Factor II 


1 2: Total 
Factor | l Puy Pi2 Pir 
2 Pr P22 Pr 
Total Ps Ps2 l 


The ‘Total’ probabilities are called marginal proportions (probabilities). Notice that there are three genuine 
(or linearly independent) parameters p, and two genuine marginal proportions, since the sum equals 
1. In the R x C table there are R-C-—1 genuine parameters p; and R—-1+C—1 genuine marginal 
proportions. 


The frequencies in a 2 x 2 table are distributed Multinomid (n, p,,, P\>, P,P) (Cf. Ch. 2.2.1 (6)), so 
the probability of the outcomes, or Likelihood if we are interested in the parameters, can be written 


IU HM12 pH IY21 py Y22 


L = (const.) p;;' Pix’ P2i' P2x - We will consider two types of hypotheses: 


Equality ofmarginalproportions. H 9 : P\, = P,,.Thisiseasilyseentobeidentical with H, : p,, = Pp, (=P). 


(= p). Under H, there are 2 genuine parameters to estimate. 


Independency between Factor I and Factor II. In this case H, : p, = p;,-p,;. Under H, there are 2 


genuine parameters to estimate. 
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EX 71 Test of equal marginal proportions in the 2x2 table by means of McNemar’s test. 


As just noticed we shall test Hy : p,, = Di; > Ho : Po = Pio Pp). Let the Likelihood under H, be denoted 
Ly .Then 


Ly = (const.) p}}' p*"” (1 py, — 2p)” . (Notice that we only keep the genuine parameters.) 


In Ly = In(const.) + y,, In pyy + (V2, +12) INP + Yo, INU py, — 2p) 


dinLy 1 -1 dinL, 1 —2 

Bi =Vi + y,(—— —|- 0, dp =(Vn Yad ya =| =i) 
We thus have the equations 
eee = Vo M- pr — 2p) (1) 

(Va + ¥i2)/ P = 222 (I- py —2p) (2) 


Since these may be somewhat tricky to solve we show an example of a solution. 


Multiplying Eq. (1) by 2 and taking the difference between the left and right-hand sides in the two equations yields 
(Ya t¥i2)/P-2y1/ Py =9 > 


+ 
Py =2 PY, (Va, + V2) which inserted into Eq. (1) gives Ya TY _ Y22 


2p 1=2pyy, (a, +12) - 2B 


So, On + Via) = PIV, + Yo tV t+) =P N> P= On + Via) and this inserted into Eq. (1) gives 


2 2n 
Py, = V1, /N and finally p,, =1— p,, -2p=yo)/n. 


(y, —n-(B,|Ho)) 
n-(Dy Hy) 


. Here 


According to the Chi-square principle. X 7 = >, 


¥, -1-(Bu|y)= Fy —n = 0, Yn —n-(By2|Ho)= Yi 


“ (%, +%y) _ (Yn, —Y%2) 


- (Yo +¥i2) _ M2 Mn) 
2n 2 


b) 


Yq, — 2+ (Boi|Ho)= You ? Yon —-(Bin|Hy) = Yon — 2-2 = 0. 


2n 2 y 
Thus 
X? =0+ (2 = You)! 2)! + (Wn —¥2)/2) soya Merta) 
Wai + Ma)? (Yq + ¥%2)/2 (Yi. +Y%n1) 


This test statistic was derived by McNemar (McNemar 1947, p. 153) and has been termed McNemar’s Test. 


Under H, the statistic is distributed x (4-1-2)= a (I) in large samples with p-value = P(x? (1) > i) 
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EX 72 Test of independency 


In the 2 x 2 table the hypothesis of independency is H) : Pi = Pit * P+j> for i, 7 =1,2 . Under H, the likelihood 
is Ly = (const.(p,, * Pa)" (Py, * Pan)? (Pay * Pan)?" (Po, * Py)”: Since we only want genuine parameters, put 
P>, =1- p,, and p,, =1— p,, . Taking logarithms gives 


In Ly = In(const.) + y,(In p,, +In py:)+ Yio(In py, + In(— py) + yoi(Ind — py.) + In p.,)+ 
Yx(In(1 - p,,.) + In(1— p,,)). 


d\n Ly oh a J Vat PG TI (Va + Ya) = Vie a 
a. Pe Pre A-pu) A-pu) Pu d-pi) Pu A- Pu) 
dinly _ Vu Pi 20 Yo _ Yuta OntYn) _ Vu a 
dp. Po G=Pa) Pa UPA) Psi (=p) Pa U-Pa) 

From this we get 
Vit ~ Vie Pit = V2 Pir > Pir = Ju = 7h and similarly p>, = Y2* also notice that 
nu + Vo.) n n 
Yi FT Vi _ V4 _ Vs2 


Pa H=l=—p,=1 and similarly Pp,» = 
n n n 


In the Chi-square statistic (25) 1- (a, 


Y,, -Y., 
H,)=n-(,.6,,)= a *) | Thus we obtain the statistic 


2, 
2 (y, =i, -Y,,/n) 
| mS free 


Par 


i,j=l 
In large samples this is distributed 7° (4 —1—2) = 77 (1) and the p-value is Ply? (1) > i) 


In the table with R rows and C columns the Chi-square statistic remains the same, but now the degrees of 


freedom is changed to R- C —1—(R-1)-—(C —1) = (R-1)(C - 1). The p-value is now obtained as 
P(x? (R-IXC-1)) > X25). 


When the hypothesis of independence between the two factors is rejected, one should go further 


in the analysis and determine which combination of levels from the factors that contributes to the 


dependency. This can be done by considering Dj = Yj — Yi, -¥.;/1- The latter is called Deviation and 
is supplied by many statistical soft-wares. If D, is greater than zero or below zero there is an over-or 
underrepresentation, respectively, of observations in cell (i, 7). Since deviations may be due merely to 
chance, one should study whether the deviation is significantly different from zero (a ‘significant deviatior). 


A statistic for this purpose is the Cell Chi-square defined by 


(y, = Se In} 


2 
nn a an 
14 44; n 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


In large samples this is distributed as a v’ (1) variable (Cochran 1954, p. 417). This means that if the Cell 
Chi-square is larger than 3.85, then the deviation is significant at the 5% level. The single cell statistic is 


supplied by statistical soft-ware, e.g. in SAS where it is denoted ‘Cell Chi-Square’ 


When analyzing deviations in many cells one should be aware of the risk of making wrong decisions 
due to the multiple inference context. When several conclusions are to be drawn simultaneously with 
5% significance one has to adjust the individual significance level so that the global level is maintained 
at 5%. This is explained further in Ch. 6.4. 


We now turn to another application of the chi-square principle, the test of fit. In this case the null 


hypothesis specifies that data have a certain distribution. In (25) Y, are the observed frequencies which 


are to be compared with the hypothetical ones under H). 
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EX 73 A test of randomness for binary data. 


A sequence of digital numbers starts with 0,0,0,0,1,1,... and ends with ...,0,0,1,0,0,1. We want to study whether these 
occur in a random order. There are several ways to do this, but one is the following: Define the variable Y = Number 
of digits until the first ‘1’ occurs. The observations on Y are 


Y 1 2 3 4 5 6 10 


Frequency 26 13 9 2 1 1 1 


From Ch. 2.2.1 (3) it follows that in this case H,: ‘Digits are in random order’ is the same as Hy : Y ~ Geometric(p), 
where p is the probability of ‘1’ 


In EX 47 it is seen that the ML estimate of p is p =1/¥.From the table above we get 


a. DP. 26-1413-24,,.4 110 108. 53 
n 26+13+...41 53 108 


The estimated expected frequency under H, of the outcome 'Y = y'is n-(1— p)” p, y =1,2.,...E.g. the expected 
frequency of the outcome 'Y = 2'is 53-(1— 0.49)? ' 0.49 = 13.2 . One obtains the following table 


y 1 2 3 | 4 5 


Expected frequency 26.0 13.2 | 68 | 34 |] 18 


Here the expected frequencies of the outcomes Y = 5 or larger are small so we throw them together in the following 
way: 53-(26.0+13.2+6.8+3.4) = 3.6. We now get the table 


y 1 2 3 4 5- 
Expected frequency 26.0 13.2 6.8 3.4 3.6 
Observed frequency 26 13 9 2 3 


2 2, 
_ (26— 26.0) ey, 4 O23) 
26.0 3.6 
There is thus no reason to reject the hypothesis of randomness. 


Pan = 1.39 > p-value = P(y?(5-1-1) > 1.39) >> 0.10. 


Comment to EX 73 It has been recommended that expected frequencies under HO shall be larger than 
2 in the Chi-square test of fit (Stuart et al 1999, p. 409), earlier recommendations were larger than 5. 
In EX 73 all expected frequencies for y larger than 4 are definitely too small. At y = 3 there is an over- 
representation of observed frequencies with Deviation = 2.2, but this isn’t serious since the cell-Chi- 


square statistic is only 0.71. 
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EX 74 Challenge the computer in ‘thinking randomly: 


The original series in EX 73 was actually made by a random number generator. (The function ranbin(0,1,p) with 
p= 1/2 in SAS.) 


It is a challenge to try to beat the computer in ‘thinking randomly’, as measured by the value GI ee : 

Write down a sequence of slightly more than one hundred 0’s and 1's. Try to place them in ‘random’ order, and repeat 
the analysis made in EX 73. You will probably not beat the computer and it is even likely that the sequence you have 
created will be rejected as random. 


A tip! By the time you will learn from the table of observed and expected frequencies how to improve your skill. Then 
it is time to challenge your friends in tournaments. 


EX 75 The following table shows the treatment times (minutes) for patients at a clinic. 


Treatment time 0-10 10-20 20-30 30-40 40- Total 


Frequency 10 16 13 6 5 50 


Mean = 20, Variance = 140, Max.value = 45 


Test whether the treatment times have a Uniform(b) - distribution. 


The cdfis f(y) = y/b, 0< vy < bandaMLestimate of b is pat oD, = 245 = 459. 


Thus, the estimated cdf is F(y) =y / 45.9 . The expected frequencies are. 

50-P(0< ¥ <10) = 50(F(10)— F(0))=10.9, 50- PO < ¥ < 20) = 50(F(20) - F(10))= 10.9 
50- P(20 < Y < 30) = 50(F(30) — F(20))= 10.9, 50- P30 < ¥ < 40) = 50(F(40) — F(30))= 10.9 
50- P(Y = 40) = 50(1— F(40))= 6.4 . Thus, 


Treatment time 0-10 10-20 20-30 30-40 40- 
Expected frequency 10.9 10.9 10.9 10.9 6.4 
Observed frequency 10 16 13 6 5 


3 (10 —10.9)? (5-6.4)° 
= Fic oF 
oe 10.9 6.4 


This p-value isn’t small enough to reject the hypothesis of a Uniform(b) -distribution. However, the sample size 
is small and there are some suspicious signs of a positive deviation for the cell 10-20 and a negative deviation 
for the cell 30-40 (although neither being significant). There seems to be reasons to search for a more realistic 
probability model. 


= 5.37, p-value = = P(y?(5—1-1) > 5.37)=0.15. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


EX 76 Check if a Gamma(A/, k) - distribution gives a better fit to the data in EX 75. 


ML estimates of this distribution are quite laborious to obtain (Cf. EX 44), therefore we confine ourselves with 
Moment estimates (Cf. EX 38) A = y/s? =1/7andk = y*/s? =2.9%3. 


k-1 
These estimates inserted into the cdf (Cf. Ch.2.2.2 (2)) F(y) =1—- eo’ > (Ay) /i! gives 
i=0 
F(y)=1-e?”” ( Tel Ty /2) From this the expected frequencies are 
50-P(0<¥ <10)=8.7, 50-P(10<¥ <20)=18.6, 50-P(20<¥Y <30)=12.9 


50: PB0<Y <40)=6.2, 50-P(Y > 40)=3.8.Thus, 


Treatment time 0-10 10-20 | 20-30 30-40 40- 
Expected frequency 8.7 18.6 12.9 6.2 3.8 
Observed frequency 10 16 13 6 5 


10-8.7)? (5—3.8)° 
ee to 
aa 8.7 3.8 


The gamma distribution seems to give a much better fit to data than the uniform distribution. 


= 0.90, p-value = P(y?(5—1—2) > 0.90) =0.64. 
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EX 77 The production of goods by a particular method has since a long time resulted in 25% god, 62% medium and 
13% bad products. In a test with a new method 50 products were produced. Of these 20 were god, 19 were medium 
and 11 were bad. 


Does the new method give products of a different quality or are the observed differences merely due to chance? 


H,: The new method gives products of the same quality as the older method. 


Quality God Medium Bad 
Expected frequency 0.25-50=12.5 0.62 -50 =31.0 0.13-50=6.5 
Observed frequency 20 19 11 


_ (20-12.5)? ; (19 —31.0)? 7 (11-6.5)? 
12.5 31.0 6.5 


(No parameters have been estimated.) There is thus a strong reason to reject H,. 


X ons — 12.3, pvalue= = P(y?(3- 1-0) > 12.3)= 0.002. 


Let's look more closely at the differences. 


Quality God Medium Bad 
Deviation +7.5 -12.0 +4.5 
Cell-Chi-square 4.50 p<5% 4.65 p<5% 3.12 (NS) 


The total Chi-square of 12.3 above shows that there is a significant difference. The table of deviations explains in a 
way the nature of the difference. The new method involves an over-representation of god products and an under- 
representation of medium products (NS is an often used abbreviation for ‘Not Significant’) 


6.2.2 The Likelihood Ratio principle 


To test the hypothesis H, :(0,,0>, = (GOA...) against H, :(0,,0 0) (OO, AM,...) we 
consider the Likelihood Ratio (LR) statistic A = Le /L. Here i is the likelihood under H, with ML 
estimators inserted for the parameters. L is the correspond likelihood under both H, and H,, i.e. 
without any restrictions on the parameters. An obvious rejection region (RR) isA<c, or —InA>c'. 


(This follows because (0 < A <1.) The LR test is performed in the following way: 


1. Compute the ML estimates of the parameters under H, and under H, (26) 

2. Compute the value of the LR statistic, say A op, . 

3. Compute the p-value for H, from the p-value =P(y2(r-s) > ~21n A gps ), where r = Number of parameters 
estimated without restrictions on the parameters and s = Number of parameters estimated under H,. 


Notice that this test is a large-sample test. (Strictly speaking the test should be termed estimated LR test 


(ELR), since estimates are plugged in for the parameters.) 


Download free eBooks at bookboon.com 


EX 78 


a) Make a LR test based on the fictive data in EX 70 to test H,: p =1/2 against H.: p #1/2. 
b) Compare the LR and the Chi-square tests when 7 = 100 and Y, = 60. 


; 2 n 
a) The likelihood is L =c- p’(1— p)” ”, wherec = . Under H, we get 


y 
Ly =c- (1/2)? 1/2)". (There are no parameters to estimate.) The unrestricted ML estimate of p is 
Ly ¢-(1/2)7(1/2)"? 


b=y/n=>L=c-(y/n)(l—-y/n)”” >A=— 
p=y (y/n)’ (1- y/n) ZL ©-G/nyd—y/ny™ 


=> —2In A = 2(yIn(2y/n)+(n— y)In[20- y/n))). 


1 
(2y/n)’ (2(1— y/n))"” 


p-value = P(y?(1—0) > —21n A gags). 


b) Chi-square test 
> _ (60-100-1/2)? : (40 —100-1/2)? 
100-1/2 100-1/2 
ER test 


= 4.00 = p-value = P(y?(1) > 4.00)= 0.0455. 


—2In A = 2(601n(2- 60/100) + (100 - 60) In[2(1 - 60/100)}) = 4.03 > p - value = 
P(y?(1) > 4.03) = 0.0447. 


The two p-values are roughly the same. In practice it will suffice to just notice that the p-values are below 5%, so H, is 


rejected at the 5% level. 
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EX 79 Test of equality between two proportions in independent Binomial samples. 


Often one is interested in comparing two proportions, e.g. the proportion of smokers among men and women. In 
such a case one takes a sample of men and a sample of women that is independent of the first sample. (A sample of 
couples is thus not appropriate.) Data can be summarized in the following way: 


Sample 1 (Men) Sample 2 (Women) 
Frequency Probability Frequency Probability Total 
Smokers y. y. yay 
1 P\ 2 P2 mea) 
Non-smokers 
n—Y, I-p, Ny, —Y, I- p, n +n, —(¥, +Y¥,) 
Total ny, 1 Ny 1 ny, +N, 


We want to test H,: DP; = P>(= P) against H,; Pp, # P» 


n ¥ -y, n ) -y. 

The unrestricted likelihood is L -( "la a7)" al : Jp (1— p,)"” °? with two parameters to estimate. 
yi y2 

The likelihood under H, is Ly = My Wp per (1- p) m+n2—(yi+¥2) with one parameter to estimate. 


1AS2 
In L = const.+ y, In p, + (m, — y,) In — p,)+ y2 In py +(m — y,)Ind- p.) > 


dink _y EY) iy Fa dinL _ ys ee ee 


dp, Pp, (U-p) ny dp P2 (1— p>) Ny 


InZLy = const.+ (vy, + y,) npt+(n, +n, -Q, + y,)) Ind - p) => 


dp P (1- p) nm +N, 


Vity2 mtny—-(y\ +2) 
Vity2 ,_ 01492) 
n, +n, (n, +n,) 


JI my, y2 Ny—-Y2 
ny ny >) No 


The p-value is finally obtained from Ply? (2-1) >-2 In A ops ) 


d\nLy _VU+ty2 (n, +n, —(, + y>)) 0= pot» 
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EX 80 LR test of equality between marginal proportions. 


a) 
b) 


Method A 1 32 19 


c) 


a) 


Show how the LR test can be used to test Hy: P, = Pj. (= p):in the 2 x 2 table. 
Perform the test in the following case where two methods A and B are used to classify the same products into 
two categories 1 and 2. 


Method B 


1 2 


2 21 18 


Test the same hypothesis by using a Chi-square test. 


From EX 71 the estimated likelihood under H, is Ly = c- pj}! p’”””? p35” , where the 


aa oS Yo FYi2 = ¥2. The unrestricted estimated likelihood 


estimates are Py = 
2n n 


Bap sheyes Sapo Z Vigo. 
is L =c- pj" P>i' Pix’ Pr3° , where the estimates are p,; = , i, 7 =1,2. The LR 


yy Vat Vi2 


Yo 
ratiois A = i 2InA = (yo + V2) In p- yy In Py — Vy In py, } . P-value= 


P(z?(3—2) > -21N.A os ) 


—2 INA pgs = 3.9729 = p- value = P(y?(1) > 3.9729) = 0.0462. H, is thus rejected at the 5% level. This 


conclusion can of course also be reached by noticing that the RR consists of values largerthan 3.8416 = (1.96). 
as 42 
The Chi-square test is based on the statistic Y° = Qa =n) In this case ais = 3.92 giving the p-value 
Yat Vie 
Ply? (1) > 3.92) = 0.0472 , very close to that obtained by the LR test. 


EX 81 LR test of independency. 


a) 
b) 


c) 


Show how the LR test can be used to test Ho: p, = P;,P,; (independence) in the 2 x 2 table. 
Apply the test to the following data illustrating the relation between left/right-handedness and type of twin 
(identical = 1, fraternal = 2). 


Type of twin 
1 2 
Right-handed 207 228 
Left-handed 41 18 


Test the same hypothesis with the Chi-square test. 
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a) Under H, the likelihood is Ly =C (Pi. Ds ie (A. . Pxo a (D. re ” (D>. . Px li 7 

pane, 3 ‘ Vig +2; 

where p,, = YaTYe 5K 1,2 and P,; = 
n 


> j=1,2, 


Vij 


The unrestricted likelihood is L = c+ p7}' py? p32! p32 , where PB; = 


L 
The LR statistic is A = a and the p-value is Ply? (3-2) >—-2InAgps ). In this case the likelihood ratio can’t be 


simplified as in EX 78. 


b) After laborious computations we obtain — 2In A pp, = 10.21 and p-value is P(x?) > 10.21)= 0.0014. 


3. heer = 9.97 and p-value is P(x?) >9.97)= 0.0016. The following table is obtained for deviations: 
Deviation/Cell Chi-square 


Type of twin 
1 2 
Right-handed -11.4/0.59 11.4/0.60 
Left-handed 11.4/4.37 -11.4/4.41 


From the table it is concluded that the rejection of the null hypothesis is mainly due to a significant over- 
representation of identical twins (1) that are left-handed. 
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Comment to EX 81 The LR test for independence is laborious compared with the Chi-square test. The 
latter is also more informative since the source of the total Chi-square can be explained in terms of 
separate deviations. Today most statistical software present results for both tests, so ease of calculation 
is not a problem. It may be tempting to report the p-value that is lowest and this seems often to be the 


one obtained from the LR test, but in that case you lose the informative aspect mentioned above. 


The p-values in EX 81 have been reported with too many decimals just to illustrate differences. 


EX 82 In EX 62 a Cl for the ratio of two Poisson rates Ay and Ay was used to claim a significant difference between 
the rates. We now show how the same problem can be solved by LR testing. 


Consider the hypothesis H,: Ay = Ay (=A) against H,: Ay # Ay. 
Since the two samples are independent (Cf. EX 62) the unrestricted likelihood is 


Sex) Py rt) 
L= Ags) ates Gy DN nat and the likelihood under H, is 


x(s) yt) 
AM SHVO 65) 490) o-Als+t) 
tn _ From this the following estimates are easily obtained: 
x(s)!y(1)! 
» » th > + y(t 

4, <2), 4, 20, gx) tO, 

S t st+t 

L peor 

The estimated LR reduces to A = —2 = , since many factors cancel each other. The p-value is 


L 2O.2O 


P(y?(2-1) >-2h Aogs)- 


Be aioe BOO 
844 8 


This gives — 2In A = 6.4791 => p- value= 0.0109. The null hypothesis is rejected. 


In EX 62, s =8,¢=4,x(8) =85, y(t) =65> A= 10.63, Ay = - =16.25. 
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Ex 83 ba i, are iid with Y, ~ N(a + fx,,07) . Show how we can test H.: 2 = 0 against H:; 2 #0. 


(This model is the same as in EX 54 with the exception that there is a further parameter a # 0 |) 


1 , 2 
= y;-a— x; ) Oia S tod j 
The unrestricted likelihood is ae 7g?" : ee ee ae 0 a x) => 
(27-0°)” 2 20 
dinL —(-)):2) (y,-a- £;) a 3 han ee 
= = 20 =0>) y,=n-d+B) x, pat B-X 


dinL (-1):2)> x,y, -—@ - £x,) x n Xi Yj 
rT De re =029 Day, -8Dn + AD A 


a-X+B 

n 
dink on op if 0-1)... 24 WO e= hay 
de 2G? >) a— Px;)"( {24 oo ey 


Boo fees cad Pap hati 2) Le 
(i), (ii) gives Ye -(Sx,) fn cae 


ao7-8-a 


The likelihood under H, is Ly = <= 
(27-07) 


dl =(-)2 i @ x 
_ dinky =) 20 eran 
da 20 


dinL 0-1 : (y;,-@) (i -y)? 
ae = Dae? a) Peo 6? = AO hrm y= 


A2 
The latter estimate, obtained under H, is denoted Oo . 
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(ii) 


(iii) 


(iv) 


Exercises in Statistical Inference 


with detailed solutions Hypothesis Testing 
re n/2 
Lo o 
EX 83 (Continued) Thus, A = L = ao (Several factors cancel each other.) 
oo 
A2 
om ; 
>-—2InA= nin—- . The p-value is P(y2(2 -l)> —2In A gps). 


The factor S yy in (iv) can be expressed in several ways, e.g. S yy = aC: —X)(y; — y)- Similarly Syy = > a x) : 
~ 2’ _ oa; » \2 
The A -test statistic can be expressed more simply. 5 (y, -a- bx,) = Yr, —(y — fx) - bx,) = 


¥(9,-H- he, -D) = D0, -7)? +2? Dy -H? -28N, DW, -)= 


Syy + B'S yy - 2 BS zy = [Notice that AS py = BS yy |= Sy - BS yy. 


Thus, =|-r , where ris the sample correlation coefficient. 


6? Sw -BSwx _,__ Siv 
2 


om Syy Sy Syy 
It follows that —2In A =—nIn(1-r’). 


If (X, Y) has a bivariate normal distribution it can be shown that the conditional expectation is 


E(y|x = x) =a + Px, where B= p-Oy/Oyand Pis the population correlation coefficient. The hypothesis that 


PB = Ois thus equivalent with the hypothesis that 9 = 0 and can be tested in the same way. 
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6.2.3 Miscellaneous methods 


In this section we consider tests that are of a ‘common sense-nature. This means that the test statistics 
are sensitive to changes in the parameters and are used together with a proper RR. For example, when 
there is a sample of observations on a variable Y ~ N(w, o”) and we want to test A: H = Mo against H,: 
HM # Mo, it is obvious that we shall use a test statistic based on lv = | and reject H, for large values 
of the latter quantity. There may be situations where it is less obvious how to perform a test. Then one 
may use the Neyman-Pearson Lemma which states that, when testing H,: 0 = @ against H,:0 # 0,, the 
test with maximal power is obtained from the LR L,/L, <c (cf. Ch. 20.10-20.13 in Stuart et al 1999.) 
We will seldom need this Lemma since in the following applications the best RR agrees with the one 


obtained by common-sense reasoning. 


EX 84 Consider again the situation in EX 70 and EX 78 where we test H,: p = 1/2 against H.: p # 1/2. 


~ VY = 
Put Pp = — ,with E(p)= pandV(p)= pAlber 2s Intuitively it seems reasonable to choose the test statistic 
n n 
_ b-E(A|Ho) _ p-1/2 


.H, is rejected for large values of Iz or equivalently, for large values of 
H,)  vi/4n 


: 2 
T= (2 al ) . However, this is exactly the same test that was obtained by the Chi-square principle. 


ca 
Pts 
& 
Slt 


EX 85 Consider the situation in EX 79 where data were obtained from two independent Binomial samples with 
proportions p, and p, and one wanted to test H,: P; = Po(= P) against H.: P; * Po. 


a) Construct a test of ‘common sense-nature’. 
b) In order to test a new vaccine 90 pupils from a school were vaccinated and 66 were not vaccinated. After six 
months it was noticed how many pupils who had got a flue, with the following result: 


Vaccinated (1) Not vaccinated (2) 
With flu 4 18 
Without flu 86 48 
90 66 


Test whether the vaccine has a significant preventive effect by using the test in a). Compare the result with that which 
is obtained by using the LR test in EX 79. 
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EX 85 (Continued) 


Py = 23 -E(p, — p,|Hy) 


Ja, — p|Hy) 


a) Test statistic: T = . Here E(p, — Pp, Ho )= p-p=9, 


V(p =p, H,)= P(l- p) + P(l- p) pil P| I + I a unknown and has to be estimated. In EX 46 it 
ny Ny nM Ny x 
_D; Y. 
was seen that if (y Vi are independent and Y, ~ Binomial(n,, p) then jp = dmb, - D, = of | = p> : 
~ 4+Y% mi mJ Dun 
is an ML estimator of p and also BLUE. In this case = ————— . Thus, the test statistic to be used is 
— n, +n, 
T'= Pi~P2 . What about the distribution of 7'? 


PU p)l/n, +1/n3) 


According to the CLT / ae ee N(0,1) as n,,n, > 0. By similar arguments as in EX 23 c) it then follows that 
also T’ has a limiting standard normal distribution. (Recall that Y, + Y, ~ Binomial(n, + Ny, P) .) The latter 
convergence is however slower. 


H, is rejected for larges values of kip ,s0 p-value is 2 - P(z > vee | 


m 4 ‘ 18 . 4418 
b) F the tabl t = — = 0.0444, p, = — = 0.2727, p= 
) From e table we get Pp, 0 P2 66 Pp 90+ 66 


7 0.0444 — 0.2727 
(0.141-0.859(1/90+ 1/66) 


=0.1410, from which 


=~4.04 => p-value =2- P(Z > 4.04) = 0.00006 


The LR test in EX 77 gives 


(22/156)” (1— 22/156)'*° 7 


A = 
(4/90)*(1- 4/90)" *(18/66)'*(1-18/66)°°"8 


=> -2InA =16.8547 => 


p-value = P(?(2—1) > 16.8547) = 0.00004 
[Don't calculate A directly, but instead In A = 22 In(22/156)+...— (66 —18) In( — 18/66) .] 
Both p-values are very small and are close to each other. The conclusion is that the vaccine has significant preventive 


effect ( p-value<0.001). Avoid statements such as‘H, is rejected’ or ‘p-value = 0.00006’ if results are to be reported ina 
scientific journal. 


EX 86 In EX 62 two rates 1, and Ay, in a Poisson process were compared. A 95% Cl for the ratio R = Ay | Ay was 
(1.09, 2.17) and it was concluded that there was a significant difference between the rates. 


The same data were analyzed in EX 82 by performing a LR test, giving the p-value 0.0109. The hypothesis of equal 
rates was thus rejected. 


Consider now a third way to analyze the data, by using a test based on the conditional Poisson property in (3). 
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EX 86 (Continued) 


Ayt 


(Y(#)|X(s) + YY) =n)~ Binomial PO dys + Ayt 


Jota =hy <> Hy: p=——. Here 


X (8) =85,Y(4) = 65, p= —©— = 0.4333 and and Hy : p =1/3 against H, : p 41/3. 
85 +65 


Test statistic T = P—Po _ 0.4333 —1/3 
Vpod—poin fl 2 


3 3 150 
2P(Z > 2.6025) = 0.0094 . The null hypothesis is thus strongly rejected (p-value < 0.01). 


= 2.6025. Since T—2-» Z ~ N(0,1) the p-value is 


EX 87 In EX 51 it was shown how Cls for the parameters in the normal distribution can be constructed. Assume that 
a ae are iid where Y, ~ N(u,0°). 


a) Show how to test H, : 42= fy against H, : #4 Lp. 
b) Show how to test H, :o°* =o, against H,:0° #05. 
©) Apply the tests with 4, =16.0andoa,4 =0.4 when n=10, y = 16.67, s* = 0.7312. 
Compare these results with the results that are obtained using an approach based on Cls. 
Z ¥ = fly 
Sidn 


-1)s’ 
b) T cae ee vy (n-1). (Cf. EX 51.) p-value = 2P(z7(n -l)> | 
oo 


a) T 


~ T(n-1). (Cf. EX 51) p-value = 2P(T(n 1) > |Zogs\) 


c) n=10, y=16.67, s? =0.7312. 


To test Ay :u=16.0 against A, : u #16.0 consider the test statistic 


Te 16.67-16.0 _ 2.478 => p - value = 2P(T(9) > 2.478) — 0.035. Here it suffices to conclude from a T-table that 
V¥0.7312/10 
p-value < 0.05. 


A95% Cl for {Lis given by F+C> 
n 


, where C is determined by P(T(9) > C) = 0.025 > C = 2.262. 


Thus, 16.67 + 2.262,/0.7312/10, (16.06, 17.28) . Both approaches suggest that H, is rejected at the 5% level. 


To test Hy -o° =04 against H, :o” £0.4 consider the test statistic T= 9:0.7312 _ 16.45 


p-value = 2P(77(9) > 16.45)= 0.116, so H, is not rejected. 


2 2 
A 95% Cl for o” is given by (a=)s™ <o< (n=1)S™ , where a = 2.7004 and b =19.0228 (See EX 51 for details.) 
b a 


This gives the interval (0.36, 2.44) and neither in this case is H, rejected. 
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EX 88 Given two independent sets of iid variables (x, igs and (Y, a where X, ~ N(iy,0y)and 
2 
Y,~ N(My,oy)° 


a) Show how to test Hy : a = oy against oy # Oy : 
b) Show how to test Hy : Wy = My(= 42) against wy 4 Ly. 
c) In two independent data sets one obtains 


(ny =10,, } 126, > 32 = 1692) and (n, =8, >)», =127, Diy =2122), Perform the tests in a) 


and b) above and compare the results with that which are obtained by making Cls. 


2 2 
a) Let the largest of the two sample variances be Ss . Then (Cf. EX 53.) 2X Sy ~ F(ny -1,ny -1) which under 
2 02 : 
s2 oy Sy , 
Y ; ; s 
H, becomes —— ~ F (ny —1,ny —1).The p-value (two-sided) is 2A Fem =| ty =1)> a : 
x X42 
Notice that, if we for some reason, want to test H, : ome =c: oe then the test statistic is Cc - SP . 
x 


N-¥ <E(¥ =7|H;) ang 


(V(X-¥]H,) —fo2 Iny +032 /ny 


2 2 
lf oy and Oy were known then Twould be distributed N(0,1) , but in practice the variances are unknown 


b) A suitable test statistic is T = 


and has to be estimated. If the test in a) suggests that the variances could be assumed to be equal, = o 


2 2 
, then we estimate this by G2 = (ny —DSy + (ty —DSy . (Cf. EX 53.) It follows that the test statistic to be 
ny +ny —2 
X-Y 
used is 7'= . The p-value (two-sided) is 2P(T(ny +ny —2)> IF*apal). 
(6? /ny +1/ny) 


If the test in a) suggests that the variances are unequal then we are faced with the Behrens-Fisher problem 
mentioned in the Comments to EX 53. 


When both ny and Ny are large things become simpler since we can use the fact that 


aed ~ N(0,1) 


X-Y¥ _ ok /ny to} /ny 


T"= 
Sz iny +S2/ ny Sz iny +S2/ ny Pp 


PZ ~ N(0,1) 
>1 


oz /ny +02 /n, 


The convergence in probability in the denominator above can be motivated in the following way: From (9b) 
E(S2. )=o7 and V(S2.)=const./ny => S? —2->o? (CF. (10). Similarly, S? —?-> 6} . (11a) and (11b) 


then gives the result. 
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EX 88 (Continued) 


2 
P2007)" 18 as 
(8-1) 


_ 2 
— 1692 026)" 10 «11.60, 7 =15.875,s? 


©) ¥=12.600, 5; (01 


Test of Hy :0; =o, against H, :o0, #0; 
8 15.13 : 
= =1.30=> p- value= P(F(7,9) > 1.30) = 0.34. We can thus assume equal variances. 


2 

‘ 

st 11.60 

Test of 1): My = My against 1, : My # My 


=1.91> p- value= 


G2 G0=1) 11.604 8=1)-15.13 1, 14 pr _15.875-12.600 
10+8-2 /13.14(1/10 +1/8) 


2P(T(10 +8-2)> 1.91) = 2-0.0371 = 0.074 . We can't reject Hat the 5% level. 


s Oo; ie 
7 ee Y (Cf. EX 53.). Here c, and c, are constants that are 


2 2 2 
Sy, Oy Sy-c 


A 95% Cl for the variance ratio is 


determined in the following way: 


P(F (7,9)<c,)-0.025 can't be found in most tables, but A set bate (9,7 Pl/c,) 
7) C 


= 0.025 = 1/c, = 4.82 c, = 0.21. P(F(7,9) >c,)=0.025 +c, =4.20 


Thus, the CI is ee ae ] = (0.31,6.21), in accordance with the test result above. 


11.60-4.20'°11.60-0.21 


A95% Clfor fy — Hy is (¥ -¥)+C67(1/ny +1/n,)) (Cf. EX 53,). Cis determined by 


P(T(16) > C)=0.025 > C = 2.120. 
Thus, (15.875-12.600 + 2.120.,/3.14(1/10 + 1/8) = 3.28 + 3.65. Since the interval covers zero the difference 


between the means is not significant. 


EX 89 bs a are iid variables with an arbitrary distribution and with E(Y,) = and V(Y,) = a” . Show how to 


test 1) : 4 = My when nis large. 


ye 
In EX 58 it was shown that rf —°-+Z ~ N(0,l)asn > 0. Asa test statistic we thus chose 
n 


Y -— Ly 
T =——— and p - valueis 2P\Z > |T, 
Sidn p ( Toss| 


) for a two-sided alternative. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


EX 90 ibe es and be jee are independent sets of iid variables with arbitrary distributions and finite means and 


variances. Show how to test H : {4y = Ly when both sample sizes are large. 


X-Y 
The test statistic to use is T = ~ N(O,)). 


VS? /ny +S2/ny 


An argument for the distribution follows from EX 88 b). In the latter case all variables were assumed to have normal 
distributions. But, looking back at the proof it is seen that the numerator tends in distribution to a N(0,1) -variable 
also for iid variables with arbitrary distributions. 


The statistic above can also be used for constructing a Cl for the difference between means. 


6.2.4 Nonparametric methods 


In earlier chapters inference has been based on estimated parameters in probability models. Such problems 
are said to be parametric, and others are called nonparametric. The distinction between the two methods 
is not clear-cut. Test of independency or test of equal marginal proportions are sometimes referred 
to as nonparametric methods, although a lot of parameters are involved. A tentative position is that 
nonparametric methods are less affected by unrealistic assumptions. The latter are however also based on 


assumptions, something that is often overlooked, especially that the observations are assumed to be iid. 


eee rceccecccccccccccccccccccececsccesesccssccssessccsesscsssesssessoseeshiiCafel-|ycent @ 
www.alcatel-lucent.com/careers 


La / : = 


What if 
you could 
build your 
future and 
create the 
future? 


One generation’s transformation is the next’s status quo—— 
In the near future, people may soon think it's strange that 
devices ever had to be “plugged in.” To obtain that status, there 
needs to be “The Shift”. 
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Goodness of Fit and comparison of distributions based on the sample-distribution function 


We saw in Ch. 6.2.1 that the Chi-square test can be used to test the hypothesis that the observations in 
a sample come from a certain distribution. This Goodness of Fit problem was solved by first estimate the 
parameters in the hypothetical distribution and then compare observed and expected frequencies. In this 
section we consider an alternative way to test for Goodness of Fit which we call the Kolmogorov test, also 
called the Kolmogorov-Smirnov test, proposed by the Russian statistician A. Kolmogorov in 1933. An 
important difference between the two tests is that in the Chi-square test the parameters are estimated, 
but in the Kolmogorov test the parameters have to be specified (or known). A further limitation is that 
the Kolmogorov test (in its original form) can only be applied to continuous distributions. Some attempts 
have been made to use the Kolmogorov test when parameters are estimated, e.g. in the Exponential or 
the Normal distribution. In the latter case the adjusted test is called the Lilliefors test and is provided 


by many statistical soft-wares (e.g. in proc univariate in SAS). 


The sample cdf , S,,(y), is constructed in the following way. Rank all observations from the smallest 


to the largest yq) < V2) <...< (,)- From the sequence (vast! n), we then form the step function 


S,(y) =i/n, Yay £Y < Vou. As an illustration consider the data (1,2,2,5). Then §,(y)=0,y <1 
=1/4, 1l<y<2, =3/4, 2<y<3, =3/4, 3<y<5, =l, y=5. 
Hy: F(y)=Fo(y), (the hypothetical cdf with known parameters.) 


The Kolmogorov test statistic is D, = max|S, (v) — Fo( y)| and H, is rejected for large values of D,,. In 


this case it is too complicated to compute p-values and a RR approach is simpler. The smallest values 
for which H, is rejected (two-sided test) with the significance levels a =0.05 and 0.01 and for various 
sample sizes n>1, can easily be downloaded from the internet. For large n (at least larger than 100) 
the RR consists of observed values of D, larger than 1.36/ Jn with a =0.05 and larger than 1.63//n 
with @=0.01. 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 


with detailed solutions Hypothesis Testing 


EX 91 Test whether the following ranked numbers are generated by a variable that is ~ Uniform [0,1] 
09 .20 .23 .29 .32 .34 34 .37 41 45 .53 .70 .83 .87 .94 .97 


Ay: F(y=KhO)=y. 


We get the following table: (Notice that in this case Fy(y) = y = Yj.) 


Vay 09 20 23 29 32 34 37 
i/n 1/16= 2/16= 3/16= 4/16= 5/16= 7/16= 8/16= 
0625 -1250 1875 .2500 3125 4375 .5000 
Al A5 53 70 83 87 94 97 
9/16= 10/16= 11/16= 12/16= 13/16= 14/16= 15/16= 16/16= 
5625 6250 .6875 .7500 8125 8750 9375 1 


The largest value of Dy, is |0.6250 - 0.45| = .175 and this is far below the rejection limit 3273 (@ = 0.05), so 
there is no reason to reject the null hypothesis 


et 
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A different problem is to compare two distributions by means of the observations in two independent 
samples. This was done several times in Ch. 6.2.2-6.2.3 by assuming a specific form of the distributions, e.g. 
~ N(uy,0°)and~ N(u,,07). Now we will test the hypothesis H, : F(x) = F,(v) without specifying 
the form of the cdfs, which are assumed to be continuous. The latter hypothesis is also termed the 
hypothesis of homogeneity. The test that is used will be called the Smirnov two-sample test, also called 
the Kolmogorov-Smirnov two-sample test. N.V. Smirnov (1900-1966) was a great mathematician in the 
former Soviet Union who won prices in many areas. (He is said to have won “the bronze star in vodka 
distillation” in 1940, but this is may be a student jokes.) 


The Smirnov test statistic isD = max{S,, (x) -—S,,(y)|, where S,,(x) is the sample cdf from a sample 


of size m and § (y) is the sample cdf from a sample of size n. H, is rejected for large values of D,,,, 
Tables for the test can easily be downloaded from the internet. Critical values are given for each pair of 
m,n (often denoted n,,n,) and for a = 0.05 For a = 0.01. large sample sizes, say above 25, approximate 


critical values are given by 1.36V1/m+1/n and 1.63¥1/m+1/n for a=0.01. 


The Smirnov test shall in first place be used when very little is known about the distributional form, but 
also as a complement to parametric tests in situations where it is suspected that lack of significance in 


a test might be a result of choosing a bad probability model. 
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EX 92 Check whether the following two samples of observations on the variables X, and Y, are drawn from the 
same population. 


X; 17.6 8.4 8.6 8.7 9.3 9.9 10.1 10.6 11.2 (m= 9) 
Y, 25.2 5.7 5.96.5 6.8 8.2 9.1 9.8 10.8 11.3 11.5 12.3 12.5 13.4 14.6 (n = 15) 


We make the following table of ranked observations: 


Xi) Ni Sy (x) -— Sy(y) 
5.2 0-1/15 =-1/15 
5.7 0-2/15 =-2/15 
5.9 0-3/15 =-3/15 
6.5 0-4/15 =-4/15 
68 0-5/15 =-5/15 
7.6 1/9-5/15 = -2/9 
8.2 1/9-6/15 = -13/45 
8.4 2/9-6/15 = -8/45 
8.6 3/9-6/15=-1/15 
87 4/9-6/15= 2/45 
9.1 4/9-7/15= -1/45 
93 5/9-7/15= 4/45 
9.8 5/9-8/15= 1/45 
9.9 6/9-8/15= 2/15 
10.1 7/9-8/15= 11/45 
10.6 8/9-8/15= 16/45 
10.8 8/9-9/15= 13/45 
11.2 1-9/15= 2/5= 0.400 
113 1-10/15= 1/3 
11.5 1-11/15= 4/15 
123 1-12/15= 1/5 
12.5 1-13/15= 2/15 
13.4 1-14/15= 1/15 
14.6 1-1=0 


We_ obtain Dy 17 = 0.400 = 18/45. (Tables of critical values of this test often show fractions.) Since 
P(Dy.15 >19/45% 0.422) = 0.20 it is concluded that the maximal difference isn’t large enough to reject the hypothesis 
of equal distributions. In fact P(Doi5 > 0.533) = 0.05 , soa much larger maximal difference would be required to reject 


the hypothesis. 
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The Smirnov test can be used to test the more general hypothesis H) : F\(y)=::-=F,(y),4 22. If all 
the k sample sizes are large, the p-value for H, can be computed quite easily, although the computations 
may be heavy. It is wise to have a computer program that calculates the value of the test statistic. Due 
to the usability in situations where it is hard to know the population distribution, we outline the test 


procedure (following the work in Fisz 1963, p. 409). 


Let the sample sizes be n,...n,...n, and define the constants 


K, = non, /(n, +n,),K,; = n(n, +n,)i(n +n, +n,)...K, = Jn, (n, +...+1,,)(n, +...+7;) 


Introduce the statistics 


(n,S\(y) +1,S,(y) 
(n, +N) 


D, = max|S,(y) -S\(y) 


k-1 k-1 
SO) ~ Yi mS)! Dm 


Put 4, =K,D,,i=2...kand Ay), =max(A, ...4, ). Then the p-value is 1 — (O(Ayiy)) where 
O(A)is the Kolmogorov-Smirnov 4 -distribution. (Cf. Table VIII in Fisz 1963.) 


,D; = max|S;(y) — 


D, = max" 


‘The Sign Test and The Wilcoxon Signed-Rank test for One Sample 


Let M be the population median in a continuous distribution. Then the hypothesis Hy : F(v) = Fo(y) 
implies the hypothesis H, :M =M,. E.g. if Y ~ N(u,0°) then M =. When data consist of matched 
pairs (X,, Y, a one can reduce the problem of making inference from two dependent samples to a one- 


sample problem by considering the differences D, = X, - Y,,i =1...n. In this case it is natural to test the 


1 1 


hypothesis H) : M =0 which is equivalent to H, :P(X > Y) = P(X < 7) =1/2. 


The Sign Test for H,:M =M) consists of computing the value of the test statistic Y = ‘Number of 
observations below M, (if suspiciously few are below) or above M, (if suspiciously few are above): By 
suspiciously few we mean that they deviate much from the expectation n/2. Under H, the test statistic 


is distributed Binomial(n, p =1/2). 


EX 93 Consider the following measurements of body temperature ( in degrees Celsius): 
37.1 37.0 37.3 37.2 36.9 37.4 36.8 37.1 37.3 37.3 36.9 37.0 37.5 37.2 37.1 
Are these data in agreement with the hypothesis that the median body temperature in the population is 37.0? 


Since there are just 3 values that are below 37.0 we compute the probability P(y < 3|p = 1/2)= 
3 


3 
pS ‘ (1/2)? a/2y?? =/2)" > >| _ 576 __ 9.176. So, the (two-sided) p-value is 20.0176 = 0.035 
ra “al y} 32768 


and the hypothesis is rejected. 
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The Sign Test for Matched Pairs is illustrated in the following example. 


EX 94 Blood pressure measurements (in millimeters of mercury) were obtained before and after a training program 
with the following result: 


Subject 1 2 3 4 5 6 
Before 136.9 201.4 166.8 150.0 173.2 169.3 
After 130.2 180.7 149.6 153.2 162.6 160.1 

Difference 6.7 20.7 17.2 -3.2 10.6 9.2 


Test the hypothesis that the median difference is zero. 


Since there is just 1 difference that is negative we consider the variable Y = ‘Number of differences that are negative’ 


and calculate the probability P(Y < \|p a 1/2)= fp jrmraray nm 


te / 2y'( / ay = (1/2)°(1 + 6) = 0.109 . The (two-sided) p-value is 0.22 so the hypothesis can’t be rejected by 
the sign test. 


As a comparison we use Student's T-test (Cf. EX 87.) for the hypothesis that the mean difference is zero. 
> 4, =61.2, > d? =976.46 => d =10.2, 3 = = (976.46 ~ (61.2)? /6)= 70.4440. 


10.2—0 


170.4440/6 
is rejected. 


= 2.977 = P(T(5) > 2.977) = 0.0155 . $0, the (two-sided) p-value is 0.031 and the hypothesis 


Notice that the latter test is based on the assumption that the observed differences come from a normal distribution. 
It is to be expected that tests that make use of more information about the distribution are more efficient (provided 
that the distributional assumptions are valid). However, in the next example we introduce a nonparametric test that is 
more efficient than the sign test and is nearly as efficient as the T-test. 


The Wilcoxon Signed-Rank Test for Matched-Pairs 


We will test H,:F\(x)=F,(y) based on a sample of matched pairs ae A ae Proceed in the 


following steps: 


- Form the differences D, = X;-Y, = : : 7 Ties, ie. cases with D, = 0, are eliminated. The 
‘working’ sample size after this elimination is denoted n° It is assumed that the differences are 
continuous and have a symmetric distribution about 0. 

- Rank the absolute differences from the smallest (1) to the largest (n’) and put a + or a - sign 
above the absolute difference. If two or more absolute differences are tied for the same rank, 
then the average rank is assigned to each member of the tied group. E.g. the six observations 


6<7=7=7=7<8 are given the ranks 1, 3.5, 3.5, 3.5, 3.5, 6 since (2+3+4+5)/4=3.5. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


- Put T = ‘Rank sum for negative differences’ and 7*= ‘Rank sum for positive differences. 
As a test statistic chose T=min(7,7*) and reject H, if T<7., where T,is the critical 
value in the table. Tables are easily downloaded from the internet. A good table can be found 
in Wackerly et al 2008, Table 9 in Appendix 3. The latter shows critical values for working 
sample sizes up to 50 together with the p-values 0.10, 0.05, 0.02 and 0.01 (two-sided tests). 


p-values can be computed in an exact way but this is complicated. For n > 50 one can use the 
T* -n(n+1)/4 


ae +1)(2n+1)/24 
2PU4 > |Z gas) 


fact that Z = has approximately a N(0,1) -distribution. So, the p-value is 


EX 95 Consider again the data in EX 94. The following table can be constructed: 


Sign = + + + + + 
| D, 3.2 6.7 9.2 10.6 17.2 20.7 
Rank 1 2 3 4 5 6 


From this we get T =1, T* =20>T= min(1,20) =]. The critical value from the table in 
Wackerly et al mentioned above is Tn = | and this corresponds to a p-value less than 0.05. The exact 
p-value is 1/32 ~ 0.0313, very close to that obtained by the T-test in EX 94. 
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The Mann-Whitney U Test for Two Independent Samples 


Also now we test the hypothesis H, : Fy (x) = F,(y), but there are two independent samples (X, )” 
and (Y, jie each with iid observations. The distributions may be discrete or continuous. Assume that 


Ny <ny. The test proceeds in the following steps: 


- Put all the ny +n, observations together and rank them according to their magnitude, from 
smallest to largest. Compute the rank sum for the observations that belong to the X - sample 
and call this W. 

- ‘The test statistic isU =nyny +ny(ny +1)/2—W. Under H, the distribution of U is symmetric 
about the expectation E(U) = nyny /2. Thisinturnimplies that P(U <u, )= PU >nyny —u,)- 

- H, is rejected for extremely large or small values of U with two-sided tests. Critical values are 
obtained from tables. We will show in the example below how p-values can be computed by 
using Table 8, Appendix 3 in Wackerly et al. The latter gives values of P(U <u, )for sample 
sizes up to 10 and uy) =0,1,...,ayny /2. 

U —-nyny /2 
alnyny (ny + ny +1)/12 


- For ny >10andny >10 it can be shown that Z = is close to a N(0,1) 


-distribution and p-values (two-sided) are obtained from 2- P(Z > aval). 
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EX 96 Consider the following two independent series of independent observations: 


x: 51 32 41 57 47 38 62 44 42 35 


y: 61 34 60 59 63 45 49 53 46 58 


Test whether the two series come from the same population: 


a) By using the Mann-Whitney U Test. 
b) -“- with Normal approximation. 
c) By using the T-test for two independent samples. 


a) Put all observations together and rank those observations that are coming from the x-series. 


32 34 36 38 41 42 44 45 46 47 
Rank 1 4 5 6 7 10 

49 51 33 57 58 59 60 61 62 63 
Rank 12 14 19 


The ranksumis W =14+4+54+64+7+4+10412+14419 = 78 and the test statistic is 
U =10-10+10-(10+1)/2—78=77. 


The latter value is larger than the expectation, 10-10 /2 = 50. The p-value is thus 


2-PU > 71)= 2-PU < 23). [Remember the property of the U-distribution listed above.] 


From the table we get PU < 23) = 0.0216. So, the p-value is about 0.04 and the hypothesis of similar populations 


can be rejected (p<0.05, Mann-Whitney U Test.). 


77 -10-10/2 
{10-10-(10+1)/12 


b) The observed value of Z is = 2.04] and p-value is 


Ds P(Z > 2.041) = 2-0.0207 = 0.0414. very close to the p-value obtained in a). 


Sx, =449, 5 x? = 20977, F = 44.9, 53 = a0. (20977 — (449)? /10)= 90.77 


1 


= 528, > = 28642, 7 =52.8, 5; = 28642 — (528) /10)= 84.84 
Dy yy y are ( (528)" /10) 


First we test Hy : o. — Oy [Cf. EX 88 a).]. 
2 


=1.07 > p-value =2.- P(F(9,9) > 1.07) = 2-0.46 = 0.92 . There is no reason to reject the null 
Sy 


hypothesis and we can pool the two variances 


32 _ (10-1)-90.77 + (10-1) 84.44 
(10+10=2) 


= 87.81. 
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EX 96 (Continued) We then test H, : 41, = {dy [Cf EX 88 b)] 
44.9 -52.8 
{87.81(1/10+1/10) 


According to this test we can't reject H, at the 5% level. 


=~2.041= p-value = 2- P(T(10—1) > 2.041) =2-0.038 = 0.076. 


One reason to the failure of the T-test to reject the null hypothesis in this case, may be that the normality assumption 
is violated. A histogram of the y-values gives the following pattern: 


y 25-35 35-45 45-55 55-65 


Frequency 1 1 3 5 


The histogram suggests that the y-values are sampled from a population with a skew distribution rather than a 
normal distribution, although the sample size is too small to verify this. 


This illustrates the strength of non-parametric methods since these are not, or to less extent, dependent on the form 
of the population distribution. 


Fisher’s Exact Test of Independency 


In EX 70 and EX 79 it was shown how independency could be tested in contingency tables, by the chi- 
square principle and the LR principle, respectively. Both these tests require that the sample size n is large 
and p-values are computed by using the asymptotic distribution. In small samples one can use a test 
termed Fisher's exact test to test for independence. Using the same notation for the cell frequencies in 
the 2 x 2 table as in Ch. 6.2.1, we calculate the probability of a certain outcome in the four cells, given 


that the marginal are fixed, from the expression 


VilVoal Vulva! 
P( v1.12 Ya Yor) = (27 
Mire Vi2-Y21-Y22° 


The p-value is obtained by calculating the sum of probabilities of all outcomes in the 2 x 2 table that are 


more extreme or equal to the observed outcome. This is illustrated in EX 95 below. 


This test was first suggested by R.A. Fisher who discussed an experimental investigation of a lady’s claim 
to be able to tell by taste whether the tea was added to the milk or the milk was added to the tea (‘the 
tea drinking lady experiment’). In that case all margins were fixed since there were 4 cups of each type 
and the lady was informed about this fact. However, the test is used also in situations where just one 
margin is fixed, and even when only the total n is fixed. In the last case it is an example of a conditional 
test where we condition on the margins in the present data, although we are aware of that the margins 


will vary randomly from sample to sample. 
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EX 97 The following 2 x 2 table shows the frequencies of the two variables lower back pain (yes/no) and sex (males 
and females). 


Lower back pain 


Yes No 
Sex Male 6 2 
Female 11 19 


a) Use Fisher’s exact test to investigate whether the two variables are independent. 
b) Answer the same question by using the Chi-square principle and the LR principle. 


a) Construct tables with actual and more extreme outcomes given fixed margins: 


6 | 2 | 8 7 1 8 8 | 0 | 8 
11 | 19 | 30 10 | 20 | 30 9 | 21 | 30 
17 | 21 | 38 17 | 21 | 38 17 | 21 | 38 
Table A Table B Table C 
8130117! 21! 
According to Eq. (27) the probability of the outcome in Table A is Aoi eeer The sum of the probabilities of 
the outcomes in Table A to Table C is 612111119138! 
I30!17! ! 
ese ! + ! + : = 0.0620 . This is the p-value for a one-sided test. The 
38! OLQ!11!19! = 7!TI10!20! =8!0!9!21! 


p-value for a two-sided test is usually found by doubling the one-sided value (McCullagh & Nelder 1983, p99) and this 
is the simplest alternative, but there are also other more complicated possibilities (Rao 1965, p345). The two-sided 
p-value is thus 0.1240. 


The calculation of exact probabilities is tedious, but tables can be downloaded from the internet and even 
calculators, e.g. ‘Free Fisher's Exact Test Calculator’ which can be found on www.danielsoper.com . In SAS p-values 
are obtained from proc freq by adding /chisq (see SAS manuals for details). The p-values obtained may vary slightly 
depending on which alternative is used. 


b) The Chi-square principle gives (cf. EX 70) 


X? =1.6378 + 1.3258 + 0.4367 + 0.3536 = 3.7539 , p-value = P(x? (1) > 3.7539) = 0.05027 . The latter is a two- 
sided p-value that differs very much from the value 0.1240 obtained in a). It is obvious that the sample size n = 38 
is too small for the Chi-square approximation to be valid. One simple way to deal with the problem is to use Yates 
correction for continuity (Yates 1934, p217) 


2 (viy20 — Yaa |—n/ 2) 7 ((6-19-2-11]- 38/2) 
oon Vir Vr VV 42 8-30-17-21 


Xx = 2,3635, P-value = P(y?(1) > 2.3635) = 0.1242, 


very close to the p-value in a). 


The LR principle gives (cf. EX 81) 
Py, = 8/38, py, = 30/38, p,, =17/38, p,, = 21/38, p,, = 6/38, py, = 2/38, pr, =11/38, 
Py =19/38 > -2logA= —2(-3.10014+ 1.58646 + 2.18822 — 2.58980) = 3.83052 


p-value = Ply?) > 3.83052) = 0.05033 . This is close to 0.05027 obtained with the Chi-square principle. It seems to 
be unknown how to obtain a corrected test statistic in this case. 
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Rank correlation 


Pearson’ correlation coefficient for the correlation between two variables (cf. the section about 
properties of P(\,¥2)in Ch. 2.1) is estimated by the sample correlation coefficient defined as 
r=sy /SySy,where(n—1)s; = ye — Ooay /n, (n—1)s; = es — do») /n and (n—I)s yy = | 
ys yi - oF Oy, y;)/n. However, the calculation of r requires that the scale of the variables is at 
least at an interval level, i.e. that the operations addition and subtraction make sense. For ordinal (rank 
order) data one may define other measures of correlation. The simplest is Spearman’s coefficient r, . Let 
R(x;,) be the rank of x,among x,...x, and let R(y;)be the rank of y,among y,...y,, then r, is defined as 
r but with x; and y; replaced by R(x;) and R(y;). The ranks for tied observations are treated in the same 
way as was done for the Mann-Whitney U test. If there are no ties in both the x and the y observations 
the computation of 7, can be simplified, “; =1- ale where d; = R(x;)— R(y;). 

r, can be used to test the hypothesis of no association between two variables in situations where it isn’t 
possible to obtain precise measurements, but only ranked values. In two-sided tests the null hypothesis 
shall be rejected for large or small values of r; (remember that —1 <7, <1). Critical values are found in 


tables, e.g. Table 11, Appendix 3 in Wackerly et al 2007. Tables can be easily downloaded from the internet. 


EX 98 A person was asked to make an assessment about the ability of 10 subjects and rank them. The ability of the 
subjects was then evaluated in a formal test. The result was 


Rank Test (x) 1 2 3 4 5 6 7 8 9 | 10 


according to Assessment (y) 3 4 1 5 6 8 2 | 10] 7 9 


6 
a =RSr 
wefind ) 4 5" 10-(100=1) 


the hypothesis of no association between the ranked series? Referring to Table 11 mentioned earlier, one finds the 


-52 = 0.685 . This is quite large, but is it large enough in order to reject 


critical value 0.648 for @ = 0.025 (one-sided test) and @ = 0.05 (two-sided test). The conclusion is that there is a 


significant association between the two series (p<0.05, Spearman’s rank correlation). 


6.3 The power of normally distributed statistics 


Let T be a statistic with mean @ and varianceo’(0)/n . The variance is thus allowed to be a function of 
the mean. An example of this is the sample proportion p = Y/n, where Y ~ Binomial(n, p) , with mean 
p and variance p(1— p)/n. In this section it is assumed that T is normally distributed or at least that 


T-0 


o(0)/Vn 
show a general expression for the power and then we consider some special cases. 


n is so large that z — can be assumed to be distributed N(0,1). In the examples below we first 
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EX 99 A general expression for the power. 
Find the RR for testing H,) :@ =@, against H_, : 0 # 0) when the type | error is @ , and determine the power. 
The RR is obviously of the form 7 = | >C,,1.e.(T —4)>C, or (T —)<-C,, where C, isa constant that 


depends ona. 


a= P(Reject Hy |Ho)= P(T -0,> C.,,|H)+ P(T -0)<-C, |Ho)= [Do the same operations on both sides 


of the inequality sign.]= P clic > Ca +P cia < ee = 
a(A)ivn o(G)ivn) — \oG)lvn — o(G)/Vn 


P a +P ee = [Due to symmetry.]=2 - P| Z > 
o(O)/ Nn o(Oy)/Vn 


Co.95 
o(6))/Nn 


Cg 
o(8)/vn ) 


In the sequel we choose @ = 0.05 , so =1.96 => Cho; =1.96- o(0,)/Vn. 


The RR, with & = 0.05 , is | —O,|>1.96-o(0,)/ Vn (28a) 


The power is Pow(0) = P(Reject Hy)= P(T' > 0 +1.96-0(8)/Vn)+ PIT < -1.96-0(@)/Nn)= 


[Do the same operations on both sides of the inequality sign.J= 


T-0 > Saise-cteyihn 2), T-0 ead 


o(0)/Nn o(0)/Nn o(0)/Vn o(6)/ln 


From this we get 


_ o(A) _ (9-4) a(%) (8-4) 28b 
Pow(0) oz > 196 ae sn) + oz <-1.96 ae 0) An ) (28b) 


In (28a) and (28b) @ = 0.05 . If it is very important to not falsely reject the null hypothesis one should choose a 
lower type | error. E.g. with @ = 0.01 the figure 1.96 is replaced by 2.575. 
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EX 100 Find a RR for testing Hy :p= 1/2 against H, ipF 1/2, where p is a Binomial proportion, and study 
the power. 


In EX 23 b) it was shown that __P—P canbe assumed to be distributed N(0,1) for large n. Here 
VPC p)/n 
E(p) = pandV(p) = p(1— p)/n, so inthis case 6 = p and a (8) = p(1— p) . Putting this into Eq. (28a) 
quiet 
vn Vn 


gives the following RR: 


p-1/2|>1.96 


The power is from (28b): 


Pomtpy=1{ 2>196 2 (ee) wi} z< 1.96 2 oe) wi. 

Vp-p) pd-p) Vp-p) pd-p) 
The behavior of this power is studied when n = 50 (pow1) and when n =100 (pow2). The following program codes (in 
SAS) computes the power and depicts the shapes of the powers in Figure 2 below. 


Obs p_- powl  pow2 
1 0.1 1.00000 1.00000 
2 0.2 0.99784 1.00000 
3 0.3 0.82832 0.98699 
4 0.4 0.28904 0.51631 
5 0.5 0.05000 0.05000 
6 
7 
8 
9 


data ppow; 

n1=50; n2=100; 

do p=@.1 to 9.9 by @.1; 
A=1.96/2/sqrt(p*(1-p)); 
B1=(p-1/2)*sqrt(n1)/sqrt(p*(1-p)); 
B2=(p-1/2)*sqrt(n2)/sqrt(p*(1-p)); 
pow1=1-probnorm(A-B1)+probnorm(-A-B1) ; 
pow2=1-probnorm(A-B2)+probnorm(-A-B2) ; 
output; end; 

proc print; var p pow1 pow2; 

run; 


0.6 0.28904 0.51631 
0.7 0.82832 0.98699 
0.8 0.99784 1.00000 
0.9 1.00000 1.00000 
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Exercises in Statistical Inference 


with detailed solutions Hypothesis Testing 
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EX 101 Let (Y, yy be iid observations where Y, ~ Exponential(/) . Construct a RR for testing 


Hy;A =A, against H,, : A # Ay when nis so large that the CLT is applicable. Also, study the power and notice 
what happens if n is too small for the normality approximation to be valid, say n = 10. 


EY) =VAV (=U 2 > EY) =1/4,V (7) ,s0 in this case @ = 1/4 and 0? (8) =1/ A’ . For large n, 
n 


Y-1/A 
Wn 
(28a) gives the RR: 


is distributed N(0,1). 


1/ Hy 


vn 
(28b) gives the power, where we notice that o() = L/Ag = A and G=% = VA -WaAy -|= A . It turns out that 


a(0) WA A o(6) V/A Ap 


7 -1/A,|>1.96 


the power is a function of A/ Ap: 
Pow(A/ Ay) = P(Z >1.96-(A/ Ay) = (I= (Al Ag) Wn} + P(Z <=1.96- (A/Ay) = (1= (2/ Ag) in). 


The latter expression is obtained under the assumption that n is large. When n = 10 the power is illustrated in Figure 
3 above. It is seen that the power is very weak for 7 = Al Ay >and perhaps more interesting is that the type | error 
can be smaller for r > 1 than for r= 1 (the value under H ). Such a test is called biased. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


6.4 Adjusted p-values for simultaneous inference 


We have been told to reject the null hypothesis when the p-value is small (less than 0.05). In this case 
there is just one hypothesis to test. When we increase the number of hypothesis we increase the chance to 
reject at least one of the hypotheses when it’s true. If a is the type I error for testing a single hypothesis, 
one has to make the p-values smaller so that the significance level of a whole family of hypotheses is 
(at most)a. When testing m hypotheses simultaneously the Italian statistician Bonferroni suggested that 
each hypothesis is tested at the level a/m. This advice had the drawback that extremely small individual 
p-values could be needed. An improved method was later suggested by Holm (1979, p. 65). The method 
can be described in the following way: If there are m simultaneous hypotheses to be tested, rank the 
p-values from the tests, from the smallest to the largest, p,,. <p, <... P<... Then claim simultaneous 


significance for all p-values such that p,;) < ao . The method is illustrated in the following example. 
m-it 


performance beyond expectations. 
the best employees who can 


The Power of Knowledge Engineering 


7 >, <— 


Plug into The Power of Knowledge Engineering. 
Visit us at www.skf.com/knowledge 
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Duration (Weeks) 


EX 102 The following table shows the durations of a sick leave for various age groups. 


-1 1-4 4- Total 
Age -30 48 32 12 92 
Group 30-50 35 26 40 101 
(Years) 50- 12 24 52 88 
Total 95 82 104 281 


Is there an association between Age and Duration, and in such a case, which combinations can explain it? 


The total chi-square is X° = 47.35 => p - value= P(z7(4) > 47.35) ~ 10°, so the association is very strong. 
In order to search for an explanation to this we present a table with the measures Deviation / Cell Chi-Square 
(cf. Ch. 6.2.1). 


Duration (Weeks) 


-1 1-4 4- 

Age -30 16.9 / 9.2 5.2/1.0 -22.1 / 14.3 
Group 30-50 0.9/0.0 -3.5/0.4 2.6/0.2 

(Years) 50- -17.8/ 10.6 -1.7/0.1 19.4/11.6 


There are four Cell Chi-Square measures that are relatively large so we rank their corresponding p-values. The latter 
being obtained from a table showing X? and p = P(z?(1) > ao) 


i 1 2 3 4 
Cell Chi-Square, X? 14.3 11.6 10.6 9.2 
p-value 0.0002 0.0007 0.0011 0.0024 
0.05 
oe 0.0056 0.0063 0.0071 0.0080 
3-3-i+1 


EX 102 (Continued) Here all p-values in the third row are smaller than the values in the fourth row. So, in the 
corresponding cells there are simultaneous significant deviations (at the 5% level). The conclusion is that there is an 
over-representation of members in the youngest age group with a short sick leave and also an over-representation of 
members in the oldest age group with a long sick leave. 


Notice that if we want simultaneous significance at the 1% level, there are only three cells that meet the requirement. 
= 0.0016. 


For the cell with X? = 9.2 one gets the p-value := 0.0024 > a 
There are other ways to adjust for multiple comparisons. E.g. when testing for pairwise equality of three or 
more means, one may apply the methods of Scheffe’ or Tukey. These are used within the field of Analysis of 
Variance (ANOVA) and require that many assumptions are met. The so called Holm-Bonferroni method 
just described has the advantage that it can be used generally, although more specialized methods may 


be more efficient in certain situations. 
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There is no clear-cut answer to the question ‘How many, and which hypotheses shall be considered in the 
simultaneous inference?. When testing for significant individual cell deviations in an R x C contingency 
table it is quite natural to set up R x C hypotheses. In other cases it may be harder to reach a decision 


on this issue. 


6.5 Randomized tests 


In EX 69 it was noticed that with a discrete distribution, such as the Binomial, one can't expect that the 
type I error is exactly a. However, this can be achieved by introducing a further random component 
to the RR. The methodology is illustrated in the following frequently cited example. (Observe that a 


randomized test is not to be confused with a randomization test.) 


EX 103 Let (Y, yy be iid with Y, ~ Poisson(A) .We want to test H, : A = 0.1 against the one-sided alternative 


H_,:A> 0.1 with a type | error (@ ) of 0.05 and with n = 10 


As a test statistic we take YY, and reject Hy if > >c.This choice of RR seems obvious, but can also be 


shown to follow from the Neyman-Pearson lemma. Since » ~ Poisson(nA) (cf. (3) in Ch. 3.1) we get, since 
nA=10-0.1=1: 


lee) Cc 


1 1 
a= PY, > cl|H, )= > Ss =lo¢e" = . From this we can construct the following table: 
: ! ! 
y=c+l J: y=0 Jy: 
c 0 1 2 3 
a 0.63 0.20 0.08 0.02 


Since we can't find a value of c which gives @ = 0.05 we reformulate the RR in the following way: 


_ | If )°Y, > 3, reject Hy with probability 1 
 |If S°Y, =3, reject Hy with probability P 


Now, 0.05 = P(S“Y, > 3|H,)-1+ P(S“Y, =3|H, )-P=0.02-1+ 0.0613-P 


Thus p = 2:09 — 0.0? _ 9 0620.5. 
0.0613 
In practice this means that if dy; >3 then Hy is rejected. But if Dae = 3 it is not clear if 7, should be rejected 


until you have tossed a coin where e.g. the outcome ‘head’ means rejection. 


The above example with P = 0.5 has inspired a lot of jokers to make fun about theoretical statistical 
inference. An example: ‘Patient: - Am I going to die in cancer? Statistician: - I just got the result from 
the lab but wait, first I have to toss a coin to decide about your future. Randomized tests are not to be 
used in practice for several apparent reasons. But, there is one important application for randomized tests, 
and that is when the power functions of several discretely distributed test statistics are to be compared. 


In that case it is important that the all the power curves start at the same level. 
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6.6 Some tests for linear models 
6.6.1 The Gauss-Markov model 


Let (X,Y)be a bivariate random variable (cf. the properties (1)-(10) in Ch. 2.2.1). The conditional 
expectation E (y [x = x)is called the regression function for the regression of Y on X and the conditional 
variance v(y| X= x) is called the residual variance. We will use the following notations for the population 
parameters: 
E(X) = My, E(Y) = ly, V(X) =02.,V(Y) =02, VY) =o, Cov(X,Y) =o yy, population correlation p = 24" 
Oxy 
E(v|xX 7 x)= My, = a+ B-xif linear, V (YX 7 x)= o° if constant. 


An important special case is when (X,Y) has a bivariate Normal distribution. In that case 


E(V|X =x)=a+ B-x, with B= pl =22 A=UMy—-P-uy, V(¥|X =x)=07 =02(1- p’) (29) 


x Ox 


In the Gauss-Markov model (Rao 1965, p179) the following assumptions are made: 


(y, |x, y' ,are independent and ~ N(a+ f- Kyo” ). This can alternatively be expressed 
is 


Y,=a+f-x,;+E;, where (Z, Ve are iid and ~ N(0,07) (30) 


The model in (30) is quite restrictive. It involves independency, linearity, constant variance and Normality. 
The model is not proper for follow-up, or panel data, where measurements are taken from several 
subjects that are followed in time. In the special case when a@ =0 the model is called regression through 
the origin (cf. EX 54). 


Corresponding to the population moments above there are sample moments. 


2 2 
=, xy =x? Rx bs 1) rT 


; n 7 EF 
sample correlation r= 5 where Sy =4/Sy,, estimators B= BX,6° = _ 2’ where 
= 


. 
SSE = Y@- &- B-x,)? =Syy e Sy, (Cf. EX 83) is the ‘sum of ae for errors. From the assumptions 
<2 2 2 
in (30) it follows that # ~ NBS —),@~N| a,o?(— Sep ~o? AM@=) Purthermore, fB and & are 


Sx (n—2) 
independent of 6” and ae B)= =—6°x/Syy 
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EX 104 


a) Show howto test Hy: 6 =f) against H,: 24 6, and H,:a=a, against H,:a4#aQ). 
b) The following data show the relation between Body weight in kg (x) and Body volume in liter (y) for twelve 
4-year old boys: 


x 17.1 10.5 | 13.8 | 15.7 | 11.9 | 10.4 | 15.0 | 160 | 17.8 | 15.8 | 15.1 12.1 


y 16.1 10.4 | 13.5 | 15.9 | 11.6 | 10.2 | 14.1 15.8 | 176 | 15.5 | 14.8 | 11.9 


Test the following hypotheses H, : 8 =lagainst H,:8#land H,:a=Oagainst H, : a 4 Oby considering 
the regression of Yon x. 


c) Repeat the analysis in b) but now by considering the regression of X on y. 


a) 
P— Po 


P= bo. w(ojyand 2 ~ 20-2), P= VO ISix  ~N(01) 


Jo?! S yy o  (n-2) ~ 1G 1S a Lx? (2-2) 2) 
(n - 2) 


Similarly, ~ 
<a 1 x 
o*| — + ——___ 
n (n—2)Syy 


b) })x,=171.2, ))x7 = 2511.26, >) y, =167.4, > y7 = 2400.14, > x,y, = 2454.51. 


Syy = 2511.26 — (171.2)? /12 = 68.8067, Syy = 2400.14 — (167.4)? /12 = 64.9100, 
Syy = 2454.51 —- (171.2)(167.4)/12 = 66.2700 


poe oe 2 1505p poi, Ge pines 
68.8067 (z=) 
1 
H,:P=1, T= Ee =1.01= p-value =2- P(T(10) >1.01)=2-0.17=0.2 
V0.1088/ 68.8067 
No reason to reject Hy. 
0.21-0 
Hy:a=0, T= = 0.63 = p- value = 2: P(T(10) > 0.63) =2-0.27 = 0.54 
0.1088) — gh 27)" 
i2* 68.8067 
No reason to reject /7) . We have an example of regression through the origin. 
c) Now the regression function is E(x\Y = y)= ait py. 
1.02 -1 
Hy: f=l, T= = 0.48 = p - value = 2- P(T(10) > 0.48) =2-0.32 = 0.64. 
J0.1148/(64.91) 
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EX 104 (Continued) 


Poapet 1.02-1 


~ f0.1148/(11- 64.91) 


= 1,652 = p- value=2- P(y?(10) > 1.652)=2-0.065=0.13. 


We can't reject 7. 


H,:a=0, T= a = 0.013 > p - value = 2- P(T(10) > 0.013) = 2 -0.495 = 0.99 


o.1149( 1 4 13:95)" 
12” 64.91 


No reason to reject j 7, . Very strong reason for regression through the origin. 


Comment to EX 104 Sometimes it isn’t crystal clear which of two variables that should be regarded 
as dependent. In such situations one may try to let both be dependent and check whether the two 
regression analyses give consistent results. Notice however that a regression relation is different from a 
mathematical relation. From the mathematical relation y = a@ + #- x one can solve for the inverse relation 
x=-a/B+1/B-y=a't+fB'y. Eg. in EX 104 we dont get a@'=—0.21/0.96 =—0.22, but a'=0.024, 


Several Y-observations at each x. Test of linearity. 


Data is now (", 5X; ys 2 5 =(y, pX ios (%y.x KX si ~_,- Lhe expressions for estimation and tests of parameters 


are the same as above. The data (y, 


ve ay is interpreted as (Y,,,x,),(Y,.,x,). The difference is that we 


now can test whether there is one linear regression line through the data or not. 


Introduce the notations ¥,=+—, yY=#t—., y=41 . Then the parameter estimates can be 


computed as ae Deal Mi em ~ 


The hypothesis to test is Hy: E (y |X = x)= a+ f-x. The test statistic for this is 


_YanlG - a+ Acyl e-2) 
PRT Ne OD a.) 


. The p-value for Ho is P(F(k - 2,01, —k)> Fons) (31) 


The test in Eq. (31) is illustrated in the following example. 


Download free eBooks at bookboon.com 


EX 105 Test the following (artificial) data for linearity: 


x 1 2 3 


Y 3,4,5 1,3 3,4,5 


We first make the table: 


: 1] 3 | 3 3 3,4,5 ail) 2 12 
2 4 2 4 8 1,3 2 4 3 
3 Q 3 9 27 3,4,5 4 12 36 
Total 8 16 38 28 56 
From this we get # = aoe a Ss G52 28 Re 
38-(10)° /8 8 


The value of the test statistic can then be computed from the following table: 


x; Mi; Yi Y, n; (7 —(@+ px,)) = — vy 
1 3 | 34,5 | 4 | 3(4-3.5)? =0.75 L+0+1=2 
g 2 13 2 | 2(2-3.5)? =4.50 I+1=2 
3 3 | 3,45 | 4 | 343.5)? =0.75 10+ 1=2 
Total 8 6 6 
6/(3—2) 


The statistic is F = =5.00= p - value = P(F(1,5) > 5.00) = 0.076 . The hypothesis of linearity 


6/(8—3) 
can't be rejected at the 5 % level. The p-value is however small and one should look for other alternatives than the 
straight line. 


Regression towards the mean- how to ‘lie’ with regression analysis 


When people with extreme values of the measurements, such as high blood pressure, are measured once 
more it is found that the mean of the extreme group is closer to the mean of the whole population. If 
people with extreme values are treated with some medicine the decrease may be interpreted as showing 
the effect of the treatment (a significant negative value of $). The problem is that the mean level may go 
down (significantly) even if people are not treated. This phenomenon, known as regression towards the 
mean, can be explained by measurement errors and natural biological variations (cf. Davis, p. 493). It is 
actually linked with the word “regression” used by F. Galton in a paper from 1885, who found that the 
height of children from very short or very tall parents move toward the average. This false pattern is 
more pronounced if we relate change with initial value. The following theoretical example is instructive 
if you want to make an experiment which proves that your ‘hocus pocus drug’ has a significant lowering 
effect on blood pressure, anxiety, cholesterol, body weight etc. The intention is of course that you shall 


use the knowledge to reveal others, not to use it for their own purposes. 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


EX 106 Regression of ‘Change’ on ‘Initial value’. 
Introduce the following notations and assumptions: 


Y, = Systolic Blood Pressure (SBP) at a point in time and Y, =SBP at a time later. Put D = Y, — Y,. For simplicity it is 


assumed that V(Y,) =V(Y>) =”. The correlation between pPe= Covhi¥a) ) 


We are interested in the relation between Y, and D . Assume that (Y, ,D) has a bivariate Normal distribution and 


consider the regression function E(DIy, = y,)= a+ B+ y, .From Eq. (29) we know that 2 = Cont, D) and 
from Ch. 2.1 we get V(%) 


Cov(¥,,D) = E(Y, -D)~ E(Y,)E(D) = E(Y,(% — %1))- EXE) — EY) = EGY) - EY EW) - 


E(¥’)+(EQ)) =Cov(Y,.¥) V(X) = 0 py — 07 =-0" (1 pix) => B =—(L— Pa), which in 
practice is negative. 


The true regression line E(DIy, =); ) wil have a negative slope and it is thus likely that the estimated line also has a 
negative slope. 
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Multiple regression 


The regression function is now E(Y|X, =%15..,X, =X, )= a+ Bx, +...+ Bex. The assumptions about 
the variables (y, 
estimators of the parameters are given by the OLS method (cf. Ch. 4.3.1). The estimators @, Bisiss PB, 


bene ic are analogous to those in Eq. (30). Under the latter assumptions the best 


can easily be expressed in matrix form but this is beyond the scope of this book. The reader is advised to 


obtain the solutions by running some computer program, e.g. the procedures proc glm or proc reg in SAS. 


Before considering the tests being of interest we make some comments about the variables X, ... X,. The 
latter are called independent variables in contrast to Y which is the dependent variable. The independent 
variables may also be termed explanatory variables (often used by econometricians) or predictors. The 
interpretation of a single regression parameter /;; is that it shows how much the expectation of Y changes 
when x; is increased by one unit and all other independent variables are fixed. But, if the x-variables 
are inter-related, or more or less collinear, this is impossible. Collinearity may lead to biased parameter 


estimates with great variance. 


One special form of independent variables is the so called dummy variable. Consider the following 


examples: 


- We want to study how Y = ‘Amount of savings’ depends on x = ‘Salary’ among men and women. 


Introduce the dummy z =1 for men and z = 0 for women. The regression model can be written 


at B,+(f8,+6;)x, if z=1 


E(Y 
a+ Bx, if z=0 


slot Beet Bet Bova] 


This is a comparison of two lines, one for men and one for women. Hypotheses of interest are 
if 2; =0 (parallel lines), or if 8, =Oand f, =0 (identical lines). Here /, is a separate salary- 
effect regardless of sex, 2, is a separate sex- effect regardless of salary and 3 is a salary-effect 
connected to sex. The latter parameter measures the interaction effect. When analyzing data 
with this model in a computer the input data consists of values in 3 columns, Y, x, z. Then you 


have to specify the model. E.g. in SAS you write the lines proc glm; model y=x z x*z; 


- Inthe above example there was a comparison of two regression lines. When several regression 
lines are to be compared things are a bit more complicated. Assume that we want to study how 
Y = ‘Household expenditure’ depends on x = ‘Salary’ during the four seasons of the year. Since 


there are four seasons we introduce three dummies such that 


Season Z) Z4 2, 
Spring 0 0 
Summer 1 0 
Autumn 0 1 0 
Winter 0 0 1 
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The model can be written 


E(Y 


This is a comparison of four parallel lines. If we allow the four lines to have different slopes 


,2152p2;)= a + Px + BZ, + B,Z, + B32; = 


a+ Px, Spring 
a+ B, + Bx,Summer 
a+ B, + Px, Autumn 
a+ B+ Px, Winter 


we add f,xz, + B;xZ, + f¢Xxz; to the latter regression function. 


Above we have defined dummies for two sexes and four seasons. In general, with c categories 
we need c-1 dummies taking the values 1 and 0. In computer programs for estimating linear 


models there may be other choices of dummies. The definition of those specific dummies that 


have been used is seen in the beginning of the print-out. 


The first result of interest in a multiple regression study is the ANOVA (Analysis of Variance) table. 


This shows how the total variation of the Y -observations can be split up into two components. This is 


shown in the following table: 


Variance source 


Degrees of freedom 


Sum of squares 


Regression (Model) k SSR = SST — SSE 
n k 2 
Error n-k-1 SSE=)° Y,-(@+ > 'B,x;) 
i=l j=l 
n —\2 
Corrected Total n-l SST = pH = 7) 


: : a2 
From the table we get an unbiased estimator of o* ,0° = 


We also get a measure of the fit of the linear model, the Coefficient of Determination R 


model. As a measure of the gain in explanatory ability by including x,,, beyond x,... x, one may use 


2 2 


SSE a 


n—k—-1 (n—-k-l) 


2 


—R 
Y|xK x vixKx, ; roe 
Paik pa mK» i.e, the actual increase in relation to the maximal possible increase. 


1-R? 


Yuk Xp 
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y (n-k-l). 


sk, SST 
taking values in the interval [0, 1]. Values close to 1 indicates that the explanatory ability of the model 
is god. ( R* is in fact the square of the Multiple correlation coefficient which is the correlation between Y 


and a, + >: B i X ii .) R’ can never become smaller when more x- variables are included in the regression 


Four classes of hypothesis to be tested 


1) 


2) 


. Hy: Alf, =O for j=1...k'<k. Test _ statistic ist = 


SSR/k 


—————..p - value= P(F'(k,n-k -1)>T, F 
SSE(n—k—1) p - valu ( (k,n ) bie) 


Hy All B, =0, j=1...p. Test statistic is T= 
This should be the first hypothesis to test and if it isn’t rejected there is no reason to continue, 
but instead try to search for better explanatory variables. 

aus Bi, _ 
H, :Some B, =(. Test statistic is T=——4+—. p-value =2- P(T(n -—k-1l)> IToas|)- 

VV (B;) 

Here B ; and V(B ';) are found from the computer out-print under the names ‘Estimate’ and 
‘Std error of estimate. Notice that since T*(f) = F(A, f) (cf. (11) and (12) in Ch. 3.1) the p-value 
can also be obtained from P(Fa, n—k-1)>Téas ). 


In this case it is perhaps more instructive to place a 95 % CI under each estimated B ;- The 
latter is obtained from £; + C.|V(B;) where Cis determined from P(T(n — k —1) > C)=0.025. 


SSE'-SSE ){ k - k') 
SSE ( n—k —1) 
the sum of square for ‘Error in the full model with k regression coefficients and SSE' 


Here SSE is 


is corresponding sum of squares in the reduced model with k—k' regression coefficients. 
p - value = P(F(k —k',n —k —1) > Tops). 


This test is perhaps the most useful one. It enables us to see whether the model with regression 
function a+ B,x, +...+ Byxp+...+ Bx, can be replaced by the regression function 


A+ Bu sXp., +-..+ B,x,. The use of this test is illustrated in the following examples. 


. Tests about linear structures of the regression coefficients. Some examples are the following: 


Hy:E (y [x = x)= Gy + Box. Here, a, By and x have fixed given values. E.g. @, and f, are the 
intercept and slope that has been observed during a long time for a production process and 


one wants to test whether a new process gives the same regression relation at x. 


(a+ x) —(&q + Box) 


ee 
n Syy 


Test statistic is T = with p-value = P(T (7 =2)> IZ. OBS )). 


It is instructive to derive the above expression. Obviously, @ + fx is unbiased for a + fx . 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


x ae 2xXO 


va + Ar)=[Cf. Eq. (2) in Ch.2.1]=V (4) + x°V(8)+ 2xCov(a, #) = a Pas 
n 


Sixx Siyy Sixx 


=~) 
= (2 pia } Since @ and # each has normal distributions it follows that 
hs XX 


(G+ Bx) — (a + Box) N(0,1). Now, dividing numerator and denominator in the expression for T 
We + px) 
above by /V(é + x) yields a statistic that is distributed as NOL) PTR 2): 


Vx (n-2){n-2) 


An alternative to testing is to construct a CI for the true regression line at x: 


is XX 
100(1— @)% CI. E.g. if we want a 90% CI when n =12, then Tables over the T-distribution shows that 


—\2 
a frnc fa 1 G2"), where C is determined from P(T(n—2)>C)=a/2to get a 


P(T(10) >1.812)=0.10/2 = 0.05, so C = 1.812. so C = 1.812. 


For several x-variables the computations are heavy and will not be shown here. Results can be obtained 
from most computer programs. E.g. in SAS the codes proc glm; model y=x1 x2 x3/clm p; will give 
you 95% CIs for the expected means E (y [x1.%95%3 =a + Bx, + B,x, + B,x,, together with predicted 


(estimated) values. 


EXPERIENCE THE POWER OF 


FULL ENGAGEMENT... 


RUN FASTER. 
RUN LONGER.. 
RUN EASIER... ~~ 
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Another type of linear structure among regression parameters is the following 
H,: 8 = B;(= #) in the regression function E(y|x = x)= a+ fx + BPoxXg + Pars: 


This has been termed a test of aggregation, in this case aggregation of the variables x, and x,. A typical 
example is when Y is ‘Prices of clothing; x, is ‘Price of leather’ and x; is ‘Price of textile. If H, is not 
rejected this means that the effects of prices of leather and can't be separated. The simplest way to 
perform the test is to run two regression analyses, one with x,,x,,x, as independent variables giving 
rise to SSE , and one with x,,(x, + x,) as independent variables giving rise to SSE’. The test statistic is 


7 - GSEWSSE)( 3 — 2) with p-value = P(F(1,n — 4) > Togs). 
SSE(n—4) 
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EX 107 Data below shows a sample of 12 persons employed in a company where Y =‘Weekly earnings (in 1000 SEK)’ 
of various ages (year) and sex (0 = Woman, 1 = Man). Assume that the Y-values are normally distributed. 


Weekly earnings 5 6 8 8 6 10 8 11 9 11 13 13 
Age (x) 20 30 | 35 | 35 | 40 | 40 45 45 50 55 55 60 
Sex (z) 0 0 0 1 0 1 0 1 1 1 1 1 


a) Test whether the mean salary differ between men and women. (Use an ordinary T-test.) 


b) Study if mean salary increases with age by using the model E(¥|x)= a+ x. From the out-print you get the 
following results: 


V(B)). 


ANOVA table 
df SS Parameter Estimate Std Error of Estimate 
Regression 1 59.00 
Error 10 19.00 
Corrected Total 11 78.00 


Test Hy: = and compute the Coefficient of Determination. (Std Error of Estimate in the table above is simply 


c) Find a proper regression model that describes how mean salary depends on both age and sex. 
Model: E(v x,z)= at fpxt Bz 
ANOVA table 
df SS Parameter Estimate Std Error of Estimate 
Regression 2 66.2396 B, 0.14 0.039 
Error 9 11.7604 P 2.07 0.879 
Corrected Total 11 78.00 
Model: E(¥|x,z)= A+ Bx+ Byz+ Bx-z 
ANOVA table 
df SS Parameter Estimate Std Error of Estimate 
Regression 3 67.1659 B 0.10 0.060 
1 
Error 8 10.8341 B -0.61 3.358 
2 
Corrected Total 11 78.00 B 0.066 0.080 
3 


d) Comment on the following statement: ‘It’s true that men earn more than women, but this is due to the fact that 
men at the company tend to be older than women and salary increases with age’ 
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EX 107 (Continued) 


= 225 — (33)7/5 
a) Women (z= 0): Vy =F 2660 ee 2729-9) 19 4 94 
5 (5-1) 
_ 75 825 —(75)°/7 
Men (z= 1): Viy = = 10.714, Sig “Gay 


3.5714 


Hy io, =0%,, F =1.984 = p - value= P(F (6,4) > 1.984) = 0.26 . No reason to reject Hy. 


4-1.80+ 6-3.5714 


The pooled variance estimate is 6” = = 2.863. 
4+6 
Hy: My = Hy > fa I eis: value= 2 - P(T(10) > 4.153)=2-0.001=0.002. 
2.86 1+ | 
5 7 


Reject H,, women have significantly lower earnings. 


b) H,:B=0,T -——- 5,56 => p - value = 2- P(T(10) > 5.56) = 2 0.0001 = 0.0002 . Reject 


Reject H, , there is a strong linear relation between age and earnings. 


The Coefficient of Determination is R2,. = Spe 0.756 (76%). 


Yk "78.00 _ 


x,z)=a + Bx+ Bz 
Ay: f, =0,7 == =3.59 = p- value= 2-P(T(9) >3.59)= 0.006. Reject Hy. 


c) Model: E(v 


Hy: By =0,T = ae =2.35= p-value=2- P(T(9) > 2.35)= 0.043. Reject H,. 


2 _ 66.2396 
"= 78.00 
X,2,X2)= at Bxt B,z+ Bx-z 

0.10 


Hy: f,=90,T= 00c0” 1.70=> p- value=2- P(T(8) > 1.70) = 0.13. There is no significant effect of x. 


= 0.849 (85%). 


Both x and zhas a significant effect on salary. R 


Model: E(v 
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—0.61 


H,: f, =0,T 336 0.18 p- value= 2- P(T(8) > 0.18)=0.86. “ ze, 
H,: f; =0,T — 0.83 > p- value= 2 - P(T(8) > 0.83) =0.43. a X°Z. 
In this case Peeae = ee = 0.861, The relative increase of R? by introducing XZ is only 

0.861 — 0.849 


10.849 = 8% . There is no point in assuming that there are two lines with different slopes 


d) It’s true that earnings increase with age, but it’s also true that there is a separate sex effect on the earnings that 
has nothing to do with the age effect. 


6.6.2 Random Coefficient models 


When repeated measurements are obtained in time from a sample of persons or companies, the Gauss- 
Markov model in the preceding chapter can be very poor. This type of data is called panel data by 
econometricians and longitudinal data or follow-up data by biometricians. The typical pattern in this 
data is that measurements from each chosen sampling unit have its own development. When the latter 
develop along straight lines, each line has its own intercept and possibly also its own slope. If the Gauss- 
Markov model is used and one single model is fitted to the data, the conclusions can be totally wrong. 
If each individual slope is positive, the line fitted by the Gauss-Markov model can be negative and vice 
versa. This is nicely illustrated in Diggle et al, p. 1. Models where slopes and intercepts are allowed to 
be random are called Random Coefficient models. A general exposition of these models is beyond the 
scope of this book. Here we only illustrate the inference when intercepts are random and slopes are 
fixed, Error Components Regression (ECR) models, and when both intercepts and slopes vary randomly, 
Random Coefficient Regression (RCR) models. 


Assume that measurements are made on n persons at the same times i = 1..., t. (The latter assumption 
will simplify the computations considerably.) The value obtained of the j:th person, j =1.,...,7, at time i 


is denoted Y,,. The two models are 


ECR:Y, =A, + P-x,+ Ey 


RCR:Y, =A, +B,-x,+&; 
The assumptions are: Ey ~ N(0,0°), A, ~ N(a,o%), B, ~N( Bua; ). Cov(A,,B;)= O43. All other components 
are uncorrelated. It seems hard to motivate all those Normality assumptions, especially that slopes have 


Normal distributions, but the assumptions are needed to reach any results in the inference. In both 
models E\y, )= a+ f-x,.In the ECR model v(y, )= o4,+o° and Cov(Y,,Y,,)= [cf Ch. 2.1 Properties of 


P(Y1+¥2)(7)] E((A, + Bx, +E, A; + Bx» + E,,))-(a + B-x(at+ B-x))= 


E(4? + A,Bx) + A,Ep + A,B, + B2x)%) + BxEp, + A,Ey + BtpE, + EyEy))— 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


a” — &Bx, — aBx, — B°x,x; =[Many terms cancel each other out]= E(4?)-a? =V(A,)=o4. Thus, the 
correlation between Y, and ¥,, is Cov(¥,.¥,;)/.|V%jVM%,)) =04/(0; +07). So, there is a constant 
correlation between measurements within each person. In the RCR model similar calculations yields 
V(¥,)= O74, + 2x0 4, + xo and Cov(Y,, ¥;;) = 7 +(x, + XO 4g + X;XnO7R - From these expressions it 
is seen that a simple test to decide whether data follows an ECR- or a RCR model is to plot estimates of 
V(Y;,) against x, . If the latter relationship is constant, then we have an ECR model. On the other hand, 
if the relationship shows a quadratic pattern we have a RCR model. A formal statistical test that enables 


us to choose between the two models is presented below. 
: ae ; _ 1 > 1 —, | = 
The following statistics will be needed. x = aaa Y,= a Y,,Byy = Ds = (oJ 
2 
Six = De ~ ( y, Sxy, - DY; = (Ss [>¥} Syy, a Dy ~ {E] 


14; =Y, -A5B=-D pd =~ Ya, SSE, pul -(@;+B xp) =Syy,- (BY Sex 


xX 1 


: ee . nr 1 , \ 
(This last relation is proved in EX 81.) SSE = > SSE, a= @ = —(d4,) ; 


Sop =e, --(S4,J : 
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The hypotheses to test are H,:ECR against H,:RCR=>H,:o; =Oagainst H,:0; >0. The test 
Sigg (n—1) 


™ SSE/n(t — 2) 
Jonsson 2003, pé6 that this test statistic is identical with the test statistic F, in Hsiao 2003, p15. For the 


statistic for this is T=S with p-value = P(F(n—1,n(t —2))> Togs. It is shown in Petzold & 


more general model with k regression coefficients the reader is referred to the latter citations. Since 
computations can be quite heavy, the analysis with these types of models are facilitated by utilizing 


soft-ware, e.g. proc mixed in SAS. 


Depending on the outcome of the latter test we go further. 


. E a : Be Rei pr : . =) 
ECR model: 6? = BSE 8 Oy wom Pate V(p)= = V(a) = | oe petey 5 
n(t—-1)-1 n-l ft NS n t S.. 


Hy : B = By against H, : B # Byis tested by 7 -F5—Fo with p-value=2- P(T(n(t 1) -1) > [Zons|) . 
VV (Ph) 
RCR model: g? = *S2_ g? = Sa _ g?[ 14 G2 = Ss e V(p)= aoe o 
n(t—2)’ 4 n-l Sy) Hat By" n\ 3 i 


XX 


V(&)is the same as for the ECR model. 


H,: B=, against H, : 8 # fy is tested by 7 7 hl p-value = 2- P(T(n -l)> Pas), 
V(B) 
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EX 108 


The table below shows the concentration of HbA1c (Glycosylated hemoglobin) measured at three points in time 
(x; =1,2,3 months) on 18 patients with diabetes. The purpose of the study was to see whether it is possible to 
reduce the HbA1c level among patients by dietary advice. 


j y, y B, a, SSE, 
1 6.4 6.3 7.6 6.77 0.60 5.57 0.327 
2 9.1 8.5 8.2 8.60 -0.45 9.50 0.015 
3 7.6 8.2 6.8 7.53 -0.40 8.33 0.667 
4 7.37370 7.20 -0.15 7.50 0.015 
5 9.6 9.7 8.7 9.33 -0.45 10.23 0.202 
6 9.3 8.8 8.5 8.87 -0.40 9.67 0.007 
7 8.37.57.8 7.87 -0.25 8.37 0.202 
8 8.1 7.9 7.3 7.77 -0.40 8.57 0.027 
9 8.67.47.9 7.97 -0.35 8.67 0.482 
10 8.2 8.1 7.5 7.93 -0.35 8.63 0.042 
11 747.0 6.7 7.03 -0.35 7.73 0.002 
12 6.8 6.7 6.5 6.67 -0.15 6.97 0.002 
13 8.4 8.87.9 8.37 -0.25 8.87 0.282 
14 9.2 8.9 8.8 8.97 -0.20 9.37 0.007 
15 7.9 8.274 7.83 -0.25 8.33 0.202 
16 7.2 6.8 6.4 6.80 -0.40 7.60 0.000 
17 8.0 7.6 7.0 7.53 -0.50 8.53 0.007 
18 10.2 11.2 8.9 10.10 -0.65 11.40 1.815 
Total -5.35 153.83 4.298 
¥=2,S,, =2, By, =15.1805, B= - -(-5.35) =—0.297, @ = _ -153.83 =8.546, S,, = 28.1, Sig, =1.127 


H,: ECR against H, : RCR is tested by 


__1.127/08-1) 
4.298/18(3 — 2) 
0.187 


2 4298+ 21127 _ 9 197 50 py = 2-187 _ 9.0052 
18(3-1)-1 18-2 


=0.55=> p - value= P(F(17,1 8)> 0.55)= 0.88 . No reason to reject the ECR model. 


For the ECR model, ¢ 
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H,:f =Oagainst H, : 2 # Ois tested by 
ra 0297 


V 0.0052 


strongly rejected. 


=-4,12 > p-value =2- P(T(18(3 —1) —1) > 4.12)= 0.0001. This means that H, is 


The conclusion is that there is a significant reduction of the HbA1c level and this reduction is similar to all patients 
with a mean rate of about 0.3 units per month. 


6.7 Final words 


Statistics is not an exact science in the sense that there are clear-cut solutions to every problem. When 
analyzing linear models you find two opposite schools. The ‘significance fundamentalists’ argues that all 
non-significant parameters must be deleted from the model. The argument is that unnecessary parameters 
‘steal’ degrees of freedom so that other parameters may not clear the 5% p-value threshold. On the other 
hand there are ‘significance liberals’ who retain all parameters in the model that they find interesting. 


The author’s personal view is close to that of a ‘significant fundamentalist’. 


The square root of the estimated variance of a statistic is called Standard Error of Estimate. This is an 


old fashion name, but is now common in computer printouts. 


www.sylvania.com 


We do not reinvent 
the wheel we reinvent 
light. 


Fascinating lighting offers an infinite spectrum of 
possibilities: Innovative technologies and new 
markets provide both opportunities and challenges. 
An environment in which your expertise is in high 
demand. Enjoy the supportive working atmosphere 
within our global group and benefit from international 
career paths. Implement sustainable ideas in close 
cooperation with other specialists and contribute to 
influencing our future. Come and join us in reinventing 
light every day. 


OSRAM 
Light is OSRAM SYLVANIA 


Click on the ad to read more 


Download free eBooks at bookboon.com 


Recall that there are two different ways to test for equality rates in a Poisson process. One is based on 


interval data, intervals between events (EX 80), and one based on counts, or frequency data (EX 84). 


In this book you find several examples of tests of linearity in regression models. This is an area that has 
been overlooked. Many examples where significance can't be established may be due to the fact that a 
linear model is used where a non-linear model would be more adequate. Closeness to linearity is often 
said to be measured by R”, the coefficient of determination. However, the F-test in (31) Ch. 6.6 is much 
more efficient in detecting deviation from linearity. In EX 146 where linearity was rejected by the F-test, 


one obtains R* = 0.969, which is large. 


As you have noticed, tests of hypotheses in linear models require heavy computations. It is therefore 
desirable that you supply reliable statistical software to your computer. This is of special importance 
when dealing with random coefficient models (Ch. 6.2.2) where more or less sophisticated software are 


available under the name of ‘mixed models. 


When communicating results from a statistical analysis you should avoid expressions like “H, against H, ”. 
(This is for internal use among statisticians.) Instead use formulations like “The new method gives 


significantly lower values than the old method (p<0.01, two-sided Sign test), just to take an example. 


Supplementary Exercises, Ch. 6 


EX 109 Gregor Mendel is said to be the founder of the science of genetics. He performed a large number of 
experiments to test his theories and much of these data are still available. In one famous experiment he cross- 
pollinated smooth yellow pea plants with wrinkly green peas with the following result: 


(Shape, Color) (Round, Yellow) (Wrinkly, Yellow) (Round, Green) (Wrinkly, Green) 
Theoretical 9/16 3/16 3/16 1/16 
proportion 
Observed 315 108 101 32 
frequency 


a) Make a2 x 2 table of the observed frequencies in terms of the factors Color and Shape. 
b) Test whether the observed frequencies are in accordance with Mendel’s theory. 


EX 110 The number of white blood cells per cubic millimeters is known to vary according to a Poisson distribution. 
10 blood samples from the same person showed the following number of white blood cells: 81 38 63 63 50 63 69 50 
38 31. 


a) Compare the sample mean and variance. Conclusion? 
b) Use the Chi-square principle to test whether the observations come from the same Poisson distribution. 
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EX 111 The number of persons being on sick leave per day was recorded at a department, with the following result: 


Number on sick leave 0 1 2 3 4 5- 


Frequency 12 10 6 0 2 0 


Determine whether the Poisson distribution is an adequate model for the outcome. 


EX 112 The Normal distribution is often taken for granted without giving any support for this assumption. Show how 
the Chi-square principle can be used in order to test whether the following (ordered) data can be assumed to be 
Normally distributed: 


23 23 24 27 29 31 32 33 35 36 36 37 40 42 43 43 44 45 48 48 54 54 56 57 57 58 58 58 58 59 61 61 62 63 64 65 66 68 68 
70 73 7475 77 81 87 89 93 97 


[Hint: Use some classification, e.g. -39, 40-60, 61-80, 81-.] 


EX 113 Several independent Binomial samples. 


In EX 77 and EX 83 the proportion in two independent samples were compared. Consider now ( Ae 


Y, ~ Binomial(n,, p;)and Hy: p, =...= Pp, (=p): 


where 


_D, Y, 
Under the null hypothesis p= Danii = 2 ” is BLUE (cf. EX 46). The statistic for testing Hy, is 
yD 
1 ke (ooh 
(Rao 1965, p. 333) T = —— }'n,(2; - py fee Lk —1) under H, - 
p(l-p) 
Consider the following Norwegian data: 


Season Spring Summer Autumn Winter 
Number of born boys 9251 7967 7327 7662 
Number of births 17866 15408 14251 14885 


Test whether the proportion born boys is the same for all seasons. 
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Exercises in Statistical Inference 
with detailed solutions 


Hypothesis Testing 


EX 114 Test for independency between (A, Non-A) and (B, Non-B) in the following three contingency tables 


(fictive data). 
Group 1 Group 2 
B Non-B Total B Non-B Total 
A 10 40 50 A 60 40 100 
Non-A 20 80 100 Non-A 30 20 50 
Total 30 120 150 Total 90 60 150 
Group 1+2 
B Non-B Total 
A 70 80 150 
Non-A 50 100 150 
Total 120 180 300 


Conclusions? 
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EX 115 In a sample of 100 couples, the husbands and wives were asked about their opinions about a politician. The 
result was: 


Husband 
Positive Negative 
Wife Positive 6 10 
Negative 24 60 


a) Estimate the proportion positive husbands and wives, respectively. Determine whether the difference between 
the proportions is significant by using the Chi-square and the LR principles. 
b) Are the opinions of husbands and wives independent? 


EX 116 In a medical rehabilitation project patients with different degree of estimated working capacity (low, medium, 
high) received different types of training (physical, activation, education). The following frequencies were obtained: 


Working capacity 
Low Medium High Total 
Physical 119 80 21 220 
Type of training Activation 363 50 34 447 
Education 23 12 4 39 
Total 505 142 59 706 


Is there an association between estimated working capacity and the type of training? If so, investigate which 
combinations are over/under-represented. 


EX 117 During a severe epidemic 40 % of the population were on sick leave. A telephone survey to five randomly 
chosen institutions at a University gave the following result: 


Institution no 1 2 3 4 5 
Number on sick leave 4 10 8 2 6 
Total number of employees 10 42 25 11 12 


Test whether University employees are on sick leave to the same extent as the rest of the population. 


EX 118 In a factory there were 10 accidents during 2 weeks. After this equipment were renewed and during the 
following 3 weeks there were 5 accidents. Did the measures have a significant effect on the rate of accidents? 


[Hint: Use the conditional Poisson property and compute p-values from the Binomial distribution.] 
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EX 119 The conditional Poisson property can be generalized to k independent Poisson processes of rates And, : 


n! Atti 
Pi Pik Where p, =, 1 =1...k - The 


Vl Ve At; 


tr, 
hypothesis H) : A, =...= A,(= 4) is thus equivalent with H, : p; =~S—,/=1...k. It follows that the latter 


Dut 
(¥-@/ >it) J lt —n/ky 


nik 


P(v (4) = Vo VN (K) = yD VG) = n)= 


hypothesis can be tested by X’= = 
(cf. Stuart et al 1999, p. 393). n(t,/ it) 


for a Poisson distribution 


Test whether the data 19 16 20 25 are observations on the same Poisson distributed variable. 


10, a. . ‘ : 
EX 120 Let (y, Ie be iid with Y, ~ Exponential(A) . Derive the RR for testing H,:A=A, against H, :A# A, 


at the 5% level. Consider the case n = 10. 


[Hint: Use the Neyman-Pearson Lemma, mentioned in Ch. 6.2.3 and the property (5) in Ch. 2.2.2] 


EX 121 Let be ye and (Y, is be two independent sets of iid variables with Exponential distributions with 


parameters Ay and Ay , respectively. Show how the LR principle can be used to test Hy): Ay =A, (=A) . Perform 


the test when ny = AO, be: = 20, Ny = 60, a) = 40. 


EX 122 Let (Y, hie be iid variables where Y, ~ Geometric(p). 


a) Show how to test Hy: p=1/2against H, : p 1/2 by means of the LR principle. 
b) Perform the test when ”=50 and yy =80. 


c) Give examples where this test may be of interest. 


EX 123 16 persons participated in a weight loss program. The body weight (in kg) of each person was measured 
initially (X) and after six months (Y). The following values were obtained of the difference D=X - Y (in increasing 
order): 


-1.8, -1.7, -1.4, 0.3, 0.6, 1.6, 1.7, 1.9, 2.3, 2.4, 2.8, 3.7, 4.5, 5.8, 6.3, 6.8 
Let uw, = E(D) and test Hy : uu, =Oagainst H, : up, #0 


a) By assuming that differences are normally distributed. 
b) By performing an exact Sign test based on the Binomial distribution. 
c) By using a normality approximation of the test in b). 


[Hint: In the last case the approximation can be improved by letting 
— E(Y)+1/2 —E(Y)-1/2 
Z< pda 08 Wales Zé< YEW) 1/2 ] 


VV (Y) JV (Y) 


P(¥<y)xP and PY <y)=P 
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EX 124 The same products were classified as Bad or God by Municipal- and State authorizes. The result was 


State 
Bad God 
Municipal Bad 20 10 
God 20 50 


a) Test whether the classifications agree by testing equality between marginal frequencies. 
[Proportion Bad will suffice.] 
b) Test whether the classifications from the two authorities are independent. 


EX 125 Yeast cells were counted in a hemacytometer with the following result: 


Number of yeast cells per square 0 1 2 3 4 5 6 


Frequency 103 143 98 42 8 4 2 


Check whether the frequencies are in accordance with the Poisson distribution 


EX 126 At an industry men are working in three shifts: Morning, Day and Night. From each shift a random sample 
of 200 products were chosen and the number of defective products was recorded with the following result: 12 for 
Morning, 10 for Day and 23 for Night. 


Use a Chi-square test to draw conclusions from the data. 


[Hint: Construct a 2 x 3 table and test for independency.] 


EX 127 Use the test for a difference between two Binomial proportions (Cf. EX 85.) to draw conclusions from the data 
in the preceding example. You may have to adjust the p-values for multiple comparisons (Cf. Ch. 6.4.). 


EX 128 In a study it was found that 41 of 248 identical twins were left-handed and that 18 of 246 fraternal twins were 
left-handed. Is the difference significant? 


EX 129 In the middle of 1950 the SALK vaccine against polio was tested in USA in several multi-center studies. In one 
such study 20 000 children were vaccinated and among these 1 case of polio was detected, compared with 114 cases 
of polio among 473 000 unvaccinated children. What conclusion can you draw about the effect of the vaccine? 


EX 130 In a sample of 300 families the standard of the electronic equipment was classified as Cheap or Expensive. 
The families were also classified according to social class as Low-Middle-High. The result was 


Class 
Low Middle High Total 
Standard Cheap 38 88 31 160 
Expensive 62 42 39 140 
Total 100 130 70 300 


Test whether Standard of equipment and Social class is independent and if not, try to find some significant patterns. 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 


with detailed solutions Hypothesis Testing 


EX 131 In 2012 the yearly incidence of malignant melanoma in Sweden was 35 cases per 100 000 person. The same 
year 60 cases was observed in the city of Malmo in southern Sweden, with 110 000 inhabitants. Does this indicate 
that inhabitants in Malm6 had a significantly higher risk for malignant melanoma than people in the rest of Sweden? 


EX 132 A dealer takes a sample of 200 oranges from a large batch from his importer. He notices that 19 of these are 
of bad quality while the rest are acceptable. In a delivery from a new importer he finds that 10 oranges of 200 are bad 
while the rest are acceptable. Shall he prefer the new importer? 


a) Discuss whether a one-sided or a two-sided test is preferably. 

b) Test whether the proportion bad oranges is the same with the former and with the new importer. 

c) Use the ordinary Chi-square test to test for independency between Quality of oranges and Importer. Compare 
the result with b). 


EX 133 In a school with 156 pupils 90 were offered vaccination against a certain disease. After half a year the effect of 
the vaccination were studied, with the following result: 


Diseased Not diseased Total 
Vaccinated 4 86 90 
Not vaccinated 18 48 66 
Total 22 134 156 


Did the vaccination have a significant effect? 


[Hint: Repeat the arguments that was given in the preceding example a)-c). 
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EX 134 A certain kind of surgical treatment may lead to complications. A comparison of two methods gave the 
following frequencies: 


Old method New method Total 

Complications 15 1 16 
No complications 83 46 129 
Total 98 47 145 


Is the new method better than the old method? 


a) Use the Chi-square test for independence. 
b) Use Fisher’s exact test. 


EX 135 A comparison of life lengths (hours) between two types of bulbs gave the following result: 


Sample size Mean Stand. Dev. 
Type A 20 1128 62 
Type B 20 1236 83 


a) Test whether there is a difference in quality between the two types of bulbs under the assumption that life 
lengths are normally distributed. 
b) Repeat the test in a) but now without assuming that life lengths are normally distributed. [Hint: Use the CLT.] 


EX 136 Two varieties of wheat A and B were grown in 12 different areas. The yield is summarized in the following 
table: 


Area 1 2 3 4 5 6 7 8 9 10 11 12 


A 24 | 16 | 21 24 | 26 12 17 21 25 19 | 29 | 22 


B 21 17 | 20 | 25 21 13 15 19 | 21 22 24 | 22 


Assume that yield in an area is normally distributed and test whether the difference in yield is significant. 


EX 137 We have seen how the Chi-square principle can be used for a variety of tests. Here is another example, called 
the Median test. (There are several versions of this test.) 


The population median is defined in Ch. 2.1. As an estimator of this one may take the sample median, defined as the 
middle point of the ranked data in a sample, or the average of the two middle points in case of an even sample size. 


Test whether the following two series of data come from populations with the same median. 


A 44 40 46 22 51 41 48 38 58 60 28 40 


[Hint: Rank the observations in each series, compute the sample median m for the combined series and count the 
number of observations that are above or below m in each of the two series. Then you summarize the result in a 2 x 2 
table with frequencies of the two variables (above m /below m) and (Series A/ Series B). The classical Chi-square test 
will then give you the answer.] 
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EX 138 A beer-tasting Binomial experiment. 4 glasses of beer A and 4 glasses of beer B are served sequentially in 
random order to a person who has to decide which of the beers he is tasting. 


a) How would you in practice arrange the experiment so that the variable Y = ‘Number of correct answers’ 
~ Binomial(n =8, p = 1/2)? Why assuming that p =1/2? 

b) Assume that the person gives correct answers in y cases. What is the smallest value of y for which the hypothesis 
A, : p=1/2 is rejected by a one-sided test at the 5% level? 

c) When people are appointed as professional tasters of beer, coffee, tea, etc. they have to undergo tests in very 
long series. Assume that the person has to compare 50 glasses of beer of each kind instead of 4. Which is the 
smallest number of correct answers required to reject the hypothesis in b)? 


EX 139 In 160 families with four children the number of boys (Y ) was: 


Y 0 1 2 3 4 


Frequency 6 38 58 47 11 


Test whether the frequencies are in agreement with a variable that is distributed Binomial(n = 4, p = 0.516). 
[Hint: See EX 4.] 


a) By using the Kolmogorov test. 
b) By using a Chi-square test. 


EX 140 Independent measurements of viscosity for a certain substance were measured during two days with the 
following result: 


Day1: 37.0 31.4 34.4 33.3 34.9 36.2 31.0 33.5 33.7 33.4 34.8 30.8 32.9 34.3 33.3 
Day2: 28.4 31.3 28.7 32.1 31.9 32.8 30.2 30.2 32.4 30.7 


Has the population distribution changed from one day to the next? 


a) Use the Smirnov two-sample test. [Hint: Rank the observations within each sample and estimates the two 
sample cdf's as described in CH. 6.2.4. Then search for the largest difference between the two cdf’s. It could 
help to make a plot.] 

b) Compare the result in a) with the result that is obtained by assuming that both series are normally distributed. 
Conclusions? 


EX 141 Repeat the analysis of the two varieties of wheat in EX 136 by using the sign test. 


EX 142 15 student were ranked according to their results in Mathematics and Statistics with the following result: 


Math. | 3 5 1 12 |}10 |8 |6 ;9 | 2 15 13 7 11 | 4 14 


Stat. 2 1 3 15 | 12 | 5 9 |4 |6 13 14 10 | 7 8 11 


Is there a significant association between the two series? [Hint: Compare with EX 98.] 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


EX 143 In a clinical trial one wants to study the effect of a drug on the concentration of a substance in blood. In 
a pilot study where the concentration was measured before and after the drug was added, the following result 
was obtained: 


Before 1.10 | 0.98 0.95 | 0.99 | 1.05 | 1.20 | 0.96 | 1.07 0.96 1.06 


After 1.05 | 0.95 0.90 | 0.98 | 1.01 1.08 | 0.94 | 1.08 0.98 1.04 


The hypothesis of interest is 1) : U4) =Oagainst H,: up, #0. 


a) Test the hypotheses by means of a p-value argument under the assumption that the mean difference is normally 
distributed. Give reasons for the assumption. 

b) Let the estimates from the pilot study represent the true population parameters and assume that the variance 
of the difference is the same under H, and £7, determine a rejection region (RR) as a function of the sample 
size n under normality assumptions. The type-I error is 0.05. [Hint: Cf. Ch. 6.3.] 

c) Study the power function for the test in b). For which values of 42, is the power larger than 0.90 when n= 100? 


EX 144 In 2003 it was found that the proportion disabled (p) who were full-time workers in service profession was 
8%. Ten years later it was decided to plan a study to see whether this proportion had changed. In a sequentially 
collected sample it was found that the proportion stabilized around 3/20 = 0.15. 


The hypothesis to test is H,) : p =0.08 against H_, : p # 0.08 . Determine the sample size, n, required to get a 
power of at least 0.90 when p = 0.15. Also, determine the rejection region (RR). 
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EX 145 (y, y" are iid with Y, ~ N(u,0°) . One wants to test H, :o° =o, against H,, :o° #0, based on the 


i Jj=1 


test statistic T=(n—1)S? = Paes Y)° with a type-l error of 5%. 


a) Specify a RR of the form 7’ < aor T >b and derive the power function. 
b) Let n= 10 and study the power as a function of R = Ge lo? : 


EX 146 In EX 120, where iid observations were distributed Exponential(A), we determined the RR for testing 
H,:A=A, against H, : A # A,. Express the power of this test as a function of the ratio 


R=AI/A,. For which values of R are the power larger than 0.90? 


EX 147 A new rapid method to measure concentration of a certain substance was tested against an exact method 
with the following result: 


Exact method (X ) 1 2 3 4 5 6 


New method (Y ) 1.2 1.9 3.1 4.2 4.7 5.9 


a) Apply the model E(Y|x)= a+ B-x and test the hypotheses H,) : #=lagainst H, : 8 #1and 


H,:a=Oagainst H, :a #0. [Hint: Cf. EX 104.] 
b) If‘non-significant parameters’ appear in a) formulate an alternative model and perform the test. 


EX 148 The concentration of a substance in blood (Y) was measured and compared with a known concentration (X). 
The following result was obtained: 


Xx Y 

1 1.1 0.7 1.8 0.4 

3 3.0 1.4 4.9 4.4 4.5 
5 7.3 8.2 6.2 

10 12.0 13.1 12.6 13.2 
15 18.7 19.7 17.4 17.1 


Test whether the model E(Y|x = x)= a+ £-x is adequate. [Hint: Cf. Ex105.] 


EX 149 In an experiment one studied the relation between x = Temperature in minus degrees Celsius needed to 
reach the freezing point and Y = Concentration of an alcohol at which the freezing point was reached. The result was: 


x 0.5 1 4 16 


Y 1.21.5 1.9 2.1 3.9 4.1 7.8 8.2 


Can the relation be described by a linear regression function? 


[WARNING! The alcohol is not ethyl so don’t put a bottle of Champagne (12-13%) in your freeze of 
about - -16°C |] 
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EX 150 In the preceding example the linear model had to be abandoned. Plot the data and try to find a non-linear 
model that fits the data better. 


[Hint: Try some of the non-linear models in Ch. 2.3.1 and test whether the linearized version can be accepted.] 


EX 151 The body weight (Y ) was recorded for 64 men and women after a diet period. Let x be the 

initial weight and let z be a variable taking the value 1 for men and 0 for women. By running the model 

E(v x,z)= at Bx + Bz + B,x-z one obtained the Sum of Squares for Error SSE = 18.4381. With the model 
E(Y\x)= a+ B,x the Sum of Squares increased to SSE = 18.7454. 


Formulate and test relevant hypotheses and draw conclusions. 


[Hint: See EX 107.] 


EX 152 A frequently used relation in econometrics is the production function OU, P) =a [4 P*, where Q= 


consumed quantity, / = income level of prospective consumers and P = price of the commodity. The parameters 
B, and p, are interpreted as Income elasticity and Price elasticity, respectively. Estimate the parameters in the 


linearized model from the following data: 


(! =Total domestic private consumption (million SEK), Q = Yearly consumed quantity of strong beer (million liter), P = 
Total price of strong beer (million SEK). All prices in 1988 monetary value.) 


Year | P Q 
-73 421 027 778.4 24.1 
-74 437 067 754.5 25.0 
-75 453 748 770.7 24.9 
-76 472 681 756.3 24.5 
-77 485 582 1321.1 44.5 
-78 491 919 2193.8 76.6 
-79 507 296 2494.6 89.7 
-80 497 081 2640.8 91.8 
-81 489 929 2669.9 90.4 
-82 506 769 2840.3 99.5 
-83 507 822 3070.2 101.1 
-84 515 257 3286.3 104.9 
-85 527 904 3645.2 105.8 
-86 554 850 4196.2 120.1 
-87 584 427 4746.3 131.5 
-88 563 293 5018.0 147.9 
[Hint: Use a computer!] 
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Exercises in Statistical Inference 
with detailed solutions Hypothesis Testing 


EX 153 Time to recovery after a certain disease vary according to the Weibull distribution with survival function 
S(y)= an. (Cf. Ch. 2.2.2.) Test H, :a@ =lagainst H, :@ #1 based on the following data of the proportion 
patients that are recovered at y years which is used as an estimator of S(v),S(y). 


y 0.47 0.64 0.89 1.08 1.41 


S(y) 


5/6 4/6 3/6 2/6 1/6 


[Hint: Linearize the survival function and use the techniques for analyzing linear models in Ch. 6.6.1.] 
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Answers to Supplementary 
Exercises 


Solutions to Supplementary Exercises Ch. 3 


EX 28 F (y) = = 70 <y<b> Fy, (y)=1- [ - 2] . From derivation rule (5) in Ch. 2.3.3, 


=p p= ee i 
i waty = (-z\0 [ *) ot * 


EX29F, (y)=l-e”*’, y>0> Fy (y)= l -e@* i . From derivation rule (5) in Ch. 2.3.3, 


fig = FECA a ee ante me 


EX 30 Exact mean: E(p(1— p)/n)= ~E(p 5?) = ~(E() E(p?))= ~(E(p) Vp) + E2(p)))= 


tf p- POP?) 1 pa-p1-4]-@ oa -)) p(l-p) 


n n 
Approximate mean: 


First we notice that g(p) = p— p> > g'(p) =1—2p, g"(p) = 2 . (13) with M=p,o” me Uae ee 
n 


E(a(- p)/n)=— ~E(p ~ p?)= (p papcn PP). 


“ 2 


p(-p). 


PU-Pp), 
1 


It follows that V(p)= = ( is unbiased for V(p) = pl=p) P) 
n= n 


EX 31 p;,i =1,2 have yw; = p, and o; = p,(_— p;)/n; and 01, = 0 (due to independence). Thus we get from 


2 2 
Sior rid ={ 2) [etapninn pata ps)/t 9} (21) a2 a 
P2 Pi P2 P2 MP,  MzP2 


The expression for the estimated variance is finally obtained by replacing p, by Y, /n,. 
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Exercises in Statistical Inference 
with detailed solutions Answers to Supplementary Exercises 


EX 32 From (9b) we know that // = E(S’) = o and (don't use o in the following to avoid confusion) 


. The latter relation follows since 


= 1 Io 
v(S?) =o - ae hey, ~ vtu,0)]- 2 


4 
‘ Oo 


o ry OS 2- 2 eS 
aay (n-) >V(S*) Gp? (n-1) ep?” 1) 


Sos 


I I 

> Fs 1 

Since § =~S” =(S*)? we consider the function g(y) = y? with g'(y) = ems and g"(y) = 
i 


3/2 ° 
Thus, 4y 


seat 2h ese 
a) E(S)*(o") fate po o 8o3 


2 
: 
a) a 


V(S’) 
4a? 


V(S) 


1 20% o 
=Oo- 
80° (n-1) 4(n—1) 
26°. » oF 
Ao” (n-1)  2(n-1) 


b) E(S) =o - 


V(S) = 


= Sweden 
Sverige 
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a) 


oo 
F(y)=+,0<y<band Fy, (y)=(F())" > Fy, 0) = x. 7 = fy, == 
n veo n pt! n 
- [tay = =—: = b> 
b"|n+i]_, 6" +l) (+) 
b= as ae is unbiased for b. 
n Gay _ on 2 
V(b)=—5—V(Yn)). Now, EY, aie jy = 7 y= 2 


: by] = : 
b) OLS estimator: SS = >| -3) =4 —_—= ep = 3|- 0=> 2» =— = 
i=l 


b) b=Length of each red light period. From the data we obtain, = 46 and y = 20.0, so the estimates are 


rye ny n 7 en 1 n : Be 
n (n+2) (n+) n (n+2) (n+l) n(n + 2) 


2 


bors = 2> 7, /n = 2Y .Thisis easily seen to be unbiased for b. The variance is 
i=l 
201 2 a 
7 2 4 5b b 
V(b = |= VY) = (Cf. 2.2.2 |= —-n— =—. 
(ous) Z}% @)=[ rari 


Moment estimator: Y = b/2 => byjon = 2Y = bors. 


V(b) _BPin(nt+2)_ 3 


~ 5 <1 with equality if n = 1. In the latter case 
V (bors ) b* /3n n+2 


Relative efficiency (RE) = 


band boy. are identical. 


-46 = 50.6 and bors = Oajom = 2+ 20.0 = 40.0 . Of these two the first one should be more 


> 


_ (10+1) - - 
10 


reliable since RE in this case is 1/4. 
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EX 46 


a) OLS estimator: Remember that E(Y,) =n; p andV(Y,) =n,p(— p). 


yny, 


n dss n n n . — 
ee al nar = 26 n, )2LY, nip = Vee 2 P= 2am =0=> Pos == 


n} 
i=] 
‘ 1 n 1 n 
E(Pors) = i >n,EM,) eee aa -N; Pp = Pp (Unbiased.) 
> n2 i=l n2 i=l 
i=l i=l 
n P 
1 e | n Nn; 
V Bors) =a i VK) = Dn P- ) = p= p) 
n il # = 2 
n; "| Yin; 
i=l i=l a 
ML estimator: 
; vy; yn vy 
L= |" a2 “d= py" =C- p= (l-p)@ | = ICisa constant which doesn't contain p] 


EX 46 (Continued) 


oy. [Smee le 
dink 2: Sn, S| 


InL=InC+ ninps| nN; — » jaa P)=> s()42 + iz il 
2m DAF 2 : > 


i=] i=] i=] 


n n 
and putting this equal to zero yields Dix = wy >on, 
i=l i=l 


E(Py)==— 5) 5%) =—-Y np = P (Unbiased) 


i=l i=l 
Sn Sy, 


i=l i=l 


VP) =— <> DV) = — Spl - p) = PEP 


EP Eyes 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 
with detailed solutions Answers to Supplementary Exercises 


V (Pu) _ F 


; 2 
nt) 
_ =! 
Vos) x 3 


<1. This follows from Cauchy-Schwartz inequality (Cf. Ch. 2.3.5) 


Comparison: RE = 


i=l 
- 2 
by lettingx, =n? and y, =n? . (It may suffice if one demonstrates the inequality numerically by choosing a few 
values of 71, .) 


eee The Graduate Programme 
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(From now on we skip the upper and lower index in the summation signs.) 


b) To show that the ML estimator is BLUE, put 7;, = ay, => E(T)= > aE(Y,) = 


Ya; -1,p = pain; = (Put) = p=) an,-1=0 (1) 


0=V(T,)+A[¥ an, -1]= ¥a?V(%,) + AL a.m, -1]= Ya? -n,p- p+ a[)°a,n, -1]> 


ao 54 ng0- pee =0>4,= 22 eg = J", say. (2) Putting this into (1) gives A'=1/ )°n, 


da; 2p(l- p) 
which inserted into (2) gives a; = 1/ >on; his; Pe = py />n, = Digs 


To show that the ML estimator is a MVE we determine the C-R limit. From a) above, 


a [cf derivation rules in Ch. 2.3. 3|= D> (2s! *)- (Sx, eae [o-e 


|. 


(er te) = [Replacing y,by Y,]=> {ine}. dee oo “Zre)_ 


2 
Ip 
> 
p (1- dp” p (1- py’ 
Dua 
Pp 


was Se 


ae P) ~ p= P) 


so the ML estimator is a MVE. 


= I(p).Thus, V(bm)= 5 


c) Let n, = Total number of students in room i, Y, = Number with back/neck pain in room i 


From the data we get Se = 90, yin; = 2750, yy = 6, yoni =175 . This leads to the following 
estimates: Dor = 175/2750 = 0.064 (6.4%), Pay = 6/90 = 0.067 (6.7%) . 


EX 47 


L=[]d-py"' p= p)®"" p" > nL = (Sy, -m)Ind= p)+nlnp => 


dinL —] n x n 
dp =(yinn ) 4" 05h, - 


1 
(l-p) p yi vy 


Define the variable Y; = Number of trials until the first'1’appears. Then n=3 and y, =4,y, =Ly; =3. 


A 3 
Thus Pmt = 8 = 0.375. 


EX 48 


a) L=[] (x0?) 20 et =(2707) 26 oF => 


Li Aay dink 4 DoW%Ari- As) _ 9g Dt 
Io dp oa pe 


InL = -2nlx0") 
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EX 48 (continued) 
P vr, is simply the BLUE estimator that was found in EX 30, without any distributional assumptions. 
In Ch. 2.2.2 (4) it was stated in the comments to the normal distribution that a linear form of normally distributed 


variables is itself normally distributed. Applying this to Bun. = yrs , with /; = se shows that Bry is 
xX; 


F ee 
exactly normally distributed with E(B, ) = 2s = PandV (By) = 


oe 
2 2 
2 a 
2 
( x?) > 


b) From the expression for In L ina) we get 


2 
d\nL be fee —n (y; — Px;) 1 
= [The derivative is with repect too”, not o!| = 3 2 a. 7 |=0> 
do 20 2 o 
; (Y, - Bini) 
C7 ML = 2 : _— . [Notice that the ML estimator has been inserted for B .] We now determine the 


n 
distribution of this by using Cochran's theorem (7) in Ch. 3.1. 


Yi, - Be, = 0G - Aw) + Brn - Be) = - Bas)? + Baas - Be)” 

since the cross product vanishes. In fact 

2° (¥, — Bra: (Br, - Bt,) = 2B — YY, — Bra xd*; = 2B — UY, - Ban D2? )= 
2B - SY, -Yx¥,)=0. thus, 1 (Y, - Be, P = Oe - Baas) + Bn - BY Dx? 
Divide each term in the latter expression by @ and write the identity as O, =, + Q3. 


We now find the distributions of QO, and Q,. 


Y, ~ N(f,,07) => =F) _ (ag) GAD". 2) 5 9, = 2 Pe ~ 7 (n) 
Oo oO 
oe (But — B) 7 N(0,1) a 0, = (igs ~ 7 (1) 


Big, ~N > 2 o 
‘ ‘ox tise Sa 


From Cochran's theorem it follows that QO, = 


,_ Bw) o 2 F 2) _ o 2 = _ 
s 5 Gap 2D with £(S = op ele (1-)= 2 


r(n-l=> 


yy - Baxi} _ 


So, 6? ui is not unbiased, but the corrected estimator S'” is unbiased. 
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EX 49 

BLUE of L. T, = > a,¥, has E(T,) = )) a,E(¥,) = Da, = “ya; = (Put) ==> 

> 4,-1=0 @ 

V(T,) = ¥.a?V(%) = Y.a7o? In, = 0? Ya? /n,. thus, O=V(T,) + ala, -1]= 
dO An 


o>) a; In, +a[Ka, -]> 3 2074, /n,+A=0> a4, ea 


(2) inserted into (1) gives Yi A'n, = UY n, =|>1'= yn, _ which inserted into (2) gives 


a, =n;/) i, $0, Loree =) mars a 


o- 


V(fiswue )= S, 5,7" i= pa = Boe 


BLUE of o-? . Notice first that S? ~ 7? (n, -1) (C£.EX16) > E(S?)= 


(n,; —1) 


2 


(n, -1) = 0° and y(s?)= V(x? (n, ea )))= 


: ot 
(n; -) (n; - iy 


o 2c04 
a > 2(n; —l) = i, -1 : 


put T, = )1a,S? > E(T,) = 9 4,E(S?)= 07 Ya, = (Put) =o? > Ya,-1=0 


a», 
(n; -1) 


V(T,) = > a?r(s?)= 20°) a; ( n, —1).Thus, 


_ 40%, 
V(T,)+A|> a; -1 20°) id ~+,A=0> 
o=v¢r,)+4[X.a, -1]= ele a 
a, = _ A; -)) = A'(n; —1) (4), which inserted into (3) yields 
4o4 
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VA, -)=1> a'= me) 


SG) a 


————_ and this inserted into (4) gives a; 
2% - 


= 204 
Oo 


Ve yg\= ee SD ay(s2) 2G ap 
en" (Kn, -p/) SacuF ia, -) 


EX 50 In this example the cell probabilities are specified as hypothetical proportions. In Ch.6, where tests of 
hypothesis are considered, there will be more examples of this. 


L=C: p”' (2p)? (1-3p)? > nL=InC+ y, Inp+ y,(In(2) + In(p))+ y; nd-3p) > 


dinL a) x + 

S0e2ty 22 ay, (9) =05 (9, +y2)(1-3p) = 3953p > Py, =——- 2 — = 
dp P p (1—3p) 3(¥, + 2 +93) 
Yi To 


+ Vo ni Y,+Y, 
. The corresponding ML estimator is PD yy, =. 


3n 


E(Py.)==-(EQ) +E) = [Cf. Ch. 2.2.1 (6)]= : 3 tn 2p) = p (Unbiased). 


VGu)= ean (V(Y,)+V(Y%,)+ 2Cow, ,¥,)) = ce neers 


n 1-3 
oP p+2(1-2p) igyoes - an 


We now find the C-R limit. 


2 2 

d a. Yi _ Ir gy, [- (2 2. [2 an Ys -|= dé nt 
dp pp (=3p) Pp (1— 3p) dp 
fee, OY, |- np+n-2p | 9n(l—3p) _3n,_9n 3n 


= =1(p) 
p> (1-3p)’ p (l-3p)’ pp (=p) pl-3p) 
Since V(P yy, ) = 1/1 (p) we conclude that the ML estimator is MVE. 
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Ch. 5 


EX 63 Data consist of naturally paired observations that are strongly dependent. Therefore the approach in EX 53 
can't be used. Instead we consider the weight-loss D = _X —Y for each subject which gives the series 


subject 1 2 3 4 5 6 7 8 9 10 


D 0.2 1.1 -0.1 0.3 -0.4 0.8 0.2 0.7 0.3 0.7 


a) 


Here we get n = 10, 2, =0.40,67, = 0.2022 (6, = 0.4497) 


The 95% Clfor Lpnis 0.40+C i , where C is obtained from P(T(9) > C) = 0.025 => C = 2,262. 
10 

Thus, the 95% Cl is (0.08, 0.72), so the conclusion is that the training program has a significant positive effect 

on weight-loss. 


The 95% Cl for Gis (Cf. EX (ERS On), where a and b are determined from 


a 


P(7?(9) > b)= 0.025 > b = 19.0226 
P(x? (9) > a)= 0.975 = a = 2.7004 


This gives the Cl (0.096, 0.674) and since the latter is far below 0.7 (the value for females) we conclude that the 
variation in weight-loss is significantly smaller among males. 


P(a< x’ (9) <b) =0.95. 


A weight-loss was observed for Y = 8 subjects of n = 10, giving p = 0.80 . From the large-sample expression 


in EX 55 a) we get 0.80 +1.960.80-0.20/10 , or (-0.05, 1.05), which is unreasonable. 


In order to use the conservative expression in (23) we need the percentiles F'y75 (6,20) = 3.13 and 
F’y75 (18,4) . The latter is hard to obtain from tables in textbooks, but one solution may be to use linear 
interpolation between F’,,; (20,4) = 8.56 and F’,,; (15,4) = 8.66, yielding F’y,; (18,4) = 8.60 . The 
latter is close to the true value 8.59 obtained by using the function finv in SAS (Cf. EX 56). 

8 (8+1)-8.59 
8+(10—8+1)-3.13 10—8+(8+1)-8.59 


The 95% Clis = (0.107, 0.976) 


From Ch. 5.4.1 2 = p(l— pNC/BY , where C = 1.645 since we want a Cl of 90%. Thus, 
n= 0.8-0.2(1.645/ 0.025) = 693. 


The reason for choosing a 90% Cl in this case is that we need to have Bound of Error small and this in turn is 
due to the fact that p is quite close to 1. 


The balance between the choice of confidence level and Bound of Error can sometimes be a delicate problem. 
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Exercises in Statistical Inference 
with detailed solutions Answers to Supplementary Exercises 


EX 64 Introduce the notations 4“, and Hc for the population means in the two groups and O;. and Ge for the 


2,2 
population variances. We first make a 95% Cl for the ratio O¢ lOp. 


In accordance with EX 53 a) we get 


Ss. = Seis. 
COPR Oc -PCPF where Plc, < F(ne —lynp -1)<c,)=0.95. 


2 
Cy OF Cy 


P(F (29,21) > c,)= 0.025 > c, = 2.317 
P(F (29,21) > c,)= 0.975 => c, = 0.455 


Since this interval well covers 1we can assume that the two population variances are equal. 
The BLUE of the common variance is 
no (Nc -)S+(ng —)SZ — 29-0.2305 + 21- 0.3623 
Fo elo te a da = 0.2859 
Nc —1+ng -1 50 


Se/ Sp = 0.6362 and gives the Cl (0.27, 1.40). 


From EX 53 b) the 95% Cl for Lc — Lp is Ye -Y, +CJ67/n¢ +1/n,,) , where Cis determined from 


P(—C <T(50) < C)=0.95 > P(T(50) > C)= 0.025 = C = 2.009. This gives the CI (0.24, 0.85). 


Since the Cl is far above 0 the conclusion is that the mean AOD in the C group is significantly larger than in the 
FAS group. 
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2 
EX65 Y ~ Exponential has E(Y) =1/A andV(Y) =1//? > E(Y) =1/A andV(Y) = Ma. 
n 
Y=1/A 
Va? 

n 


According to the CLT = VanlaY - 1) 5 Z ~ N(OL). 


From this we get 0.95 = Pl 1.96< \n(AY -l)< 1.96) and centering J finally gives the Cl 
[=a -1.96/ Vn), =(14+ 1.96/n)), 


In order to calculate the expected length of the Cl we need to know that bee ~ Gamma(A,n) so (Cf. Ch. 


i=l 


2.2.2 (2)) FE i>, ee A | =A ree} a A . The expected length of the Cl above is 
= at T(r) (n-)E(n-l) (n-1) 
a -2-1.96 
thus i = [n = 50] = 0.5662 . This is to be compared with the Cl in EX 61 a) when n= 50, 
(n-1) vn 
A (129.56—74.22) _ 0.5654. 
(n-1) 2, 


EX 66 From EX 46 D yy => LD i has E(py,) = pandV (py, ) = p(l- p)/ >on, Here > Y, has a 
Pu, ~P 

Pu Pay)! > 1, 

the 95% Cllimits Py £1.96,/ Bay (I— Pyw)! 2M, - 


From the table we get Piz = 30/100 = 0.30, so the Clis (0.21, 0.39). 


Binomial distribution, so according to the CLT PZ ~ N(O,1) . (Cf. EX 23 c).) This gives 


EX67 A = weet = ait = 105.76. (This is the OLS-, Moment-, BLUE- and ML estimate.) 


E(A) = AandV (A) =A/n, with P(A) =A/n- 


Now, following the same lines as in EX 23 c) it can be shown that 


#5 tan —? 5Z ~ N(0,1) 
( = ) Ain = =Z ~ N(O,1) = A1+1.96VA/n gives a 95% Cl for A in large 
vVA/n VA/n p 
=a 
VA/n 


samples. The Cl is in this case (96, 116). 
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Ch. 6 


EX 109 


a) A2x2 table may look as follows: 


Shape 

Round Wrinkly Total 

Yellow 315 101 416 

Color Green 108 32 140 

Total 423 133 556 

b) We obtain the following table: 
Characteristic Obs. Freq. Exp. Freq. Deviation | Cell Chisquare 

Yellow, Round 315 312.75 2.25 0.016 
Green, Round 108 104.25 3.75 0.135 
Yellow, Wrinkly 102 104.25 -3.25 0.101 
Green, Wrinkly 32 34.75 -2.75 0.218 
Total 556 556.00 0 0.470 


p-value = Ply? (1)> 0.47)= 0.49 .No reason to reject Mendel’s theory. (It's interesting that the famous statistician 
R. A. Fisher concluded that Mendel’s results were far too perfect, indicating that adjustments had been made to the 
data to make observations fit the hypothesis.) 


EX 110 


a) Inthe Poisson distribution E(Y) = 2 =V (Y).The corresponding sample estimates are 


y = 54.6 and s* = 251.8 . The variance is more than four times larger than the mean, which 
seems strange. 


b) H):Y ~ Poisson(A) 

(Y,-Y)?  (81-54.6)’ oe (31-54.6)? | 
2 54.6 54.6 
p-value = P((y2(10—1) > 41.5)< 0.005 => Reject Hy. 


41.5 


The test statistic is Y* = > 
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EX 111 Let Y='Number on sick leave per day. H, : Y ~ Poisson(A) 
» 20 F10 1 oO 30 


” ; = 
The pfis P(Y =y)=—e*, A=y= 1.0 
SpE) . 2 TPOeO 30 
Expected frequencies: 
0 1 
30P(Y =0)=30 6 =11.04, 30P(y =1)=300) e=11,04, 


2 
s0A(y =2)=30°2 e1 = 5,52, 30A(Y>3)=301— PY <2)}=30-11.04-11.04-5.52=2.40 
The latter is computed to avoid small expected frequencies (cf. comment to EX 71). 
_ 2 _ 2 7 2 7 2 
y= (12-11.04) " (10 —11.04) x (6—5.52) z (2-—2.40) 
11.04 11.04 5.52 2.40 


= 0.29 


p-value = P(y2(4 -1-l)> 0.29) >> 0.10 . There is no reason to reject the Poisson model. 
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FOR A GLOBAL 


CAREER 


IN ENGINEERING, ARCHITECTURE 
OR TECHNOLOGY MANAGEMENT 


CHALMERS 


UNIVERSITY OF TECHNOLOGY 


171 Click on the ad to read more 


Download free eBooks at bookboon.com 


EX112 H,:Y ~ N(,0°) 
From the n = 48 observations we obtain the estimates /7 = 55.125 and G =18.96. 


Approximate cell probabilities in the four cells are (Z denote a N(0,1) — variable): 


40—55.125 
18.96 


60 —55.125 p 7 < A0=95.125 
18.96 18.96 
80—55.125 _plze 60 —55.125 
18.96 18.96 


Py <40)=A| z )=oa12s 


P40< <60)= AZ < )= 0.3880 


Po0<¥ <80)= Az < )=0.3038 


80—55.125 


Aly >80)=1- Aly <80)=1-4{ Z< 
18.96 


)- 0.0948 


Multiplying these probabilities by n = 48 gives the expected cell frequencies. 
7 2 a 2 7 2 _ 2 
2 — 110.2)? 18-18.7)? | (14-14.6)" | (5-45)? _ 


0.17 
10.2 18.7 14.6 4.5 
p-value= =P(y2(4 =3=1)s 0.17)= 0.68 , No reason to reject Hy. 
EX 113 Hy: P= Pr = P3 = Pa(= P) 
We get the following proportions of born boys 
Season Spring Summer Autumn Winter Over all (p) 
Proportion 0.51780 0.51707 0.51414 0.51475 0.516055 


4 
>. n,(B; — p)” = 0.054351 + 0.015844 + 0.052297 + 0.025490 = 0.147982 
i=l 


7 0.147982 
0.516055(1— 0.516055) 


reject Hy , even though the sample is very large. 


= 0.59 => p- value= P(77(4 -l)> 0.59) >> 0.10. There is no reason to 


EX 114 For Group1 and Group2 X” = 0), whereas for Group (1+2) (1+2) X?=5.01> p-value < 0.05. Inthe 
latter case the combination of data from tables with unequal proportions and marginal frequencies has created an 
impression of association which in fact does not exist. 


This example illustrates that it is possible to ‘create non-significance’ by searching for sub-groups where no 
association is found. On the other hand, you may find significant associations in sub groups while no significant 


association is found in the total group. 


The problem can be settled by a clear definition of the population to be studied. 
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EX 115 


a) The proportion positive among husbands is roughly twice as large as among wives, 0.30 and 0.16, respectively. 
But is the difference significant? 


2 
Using the Chi-square principle in the form of McNemar’s test gives VY? = Coe =5.76=> 
24+10 
p-value < 0.05 (p-value = 0.016). There is a significant difference. 
: . 24+10 
To apply the LR test (cf. EX 78) we need the estimates p = A100 =0.17, p,, = 0.24, p,, = 9.10 


2InA= 2{(24 + 10) In(0.17) — 241n(0.24) — 101n(0.10)} = 5.9340 => p-value < 0.05 
(p-value =0.015) 


b) The ordinary Chi Square test of independency yields X* = 0.48 => p - value= Plz? (1) > 0.48)>> 0.10. 
The opinions of husbands and wives are independent. 


EX 116 
H, :No association between Working capacity and Type of training. 


The total Chi-square measure is X ? = 65.71=> p - value << 0.001. There is thus a strong association between the 


two factors. 


The next step is to search for the combination of factors that are ‘most responsible’ for the high Chi-square measure. 
This can be done by considering the table of deviations and cell Chi-square measures and then apply the Bonferroni- 


Holm principle described in Ch. 6.4. 


Table showing Deviation / Cell Chi-square 


Low Medium High 
Physical -38.4 / 9.35 35.75 / 28.88 2.62 / 0.37 
Activation 43.26 / 5.85 -39.91/17.71 -3.36 / 0.30 
Education -4.90 / 0.86 4.16 / 2.20 0.74 / 0.17 


From this we get the table of ranked Cell Chi-square measures 


i 1 2 3 4 
Cell Chi-square 28.88 17.71 9.35 5.85 
p-value <0.001 <0.001 0.0022 0.015 
0.0056 0.0063 0.0071 0.0083 


0.05/(3-3-i+1) 


Here the first three p-values are smaller than the value in the bottom row. The corresponding Cell Chi-square 
measures are thus significant after adjustment for multiple significance. There is an over-representation for the 
combination ‘Medium working capacity’ x ‘Physical training’ and also an under-representation for the combinations 
‘Medium working capacity’ x ‘Activation’ and ‘Low working capacity’ x ‘Physical training’ 
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EX 117 The hypothesis to test is Hy): p=0.4. 


nb dy 4+...+6 


An estimate of p is Pp = = = = 0.30 (cf. EX 46b) ) and an estimate of the variance of 
. Sin, din, 104...412 


P(- p) _ 0.300-0.30) 
100 


= 0.0021. 


the latter is V(p) = 


i 


D-0.4 0.30—0.4 
T= P = = : =—2.182. From EX 23 ©) it follows that we can use the 
\P(e) 0.0021 


As a test statistic we take 


normal distribution to compute the (two-sided) p-value 2 - P(Z > 2.182) = 2-0.015 =0.03 . The conclusion is 
that University employees are one sick leave to a less extent than the rest of the population. 


EX 118 Introduce the notations: 


X(2) = Number of accidents during 2 weeks before the renewal, with intensity Ay 


Y(3)= 2 3.“ after sft A 
ai, 


then (¥(3)|X (2) + ¥3) =n =15)~ Binomial(n = 15, p = ae 


3 Pia 
We want to test 1) :Ay =Ay S p= 5 . The observed number of accidents is 5 and therefore a one-sided 
, P 15 0 15 I5 5 10 
p-value is obtained by by P(Y(3) <5)= 0 i) eres aa : (3/5) (2/5) = 0.0338. 


The computation of the last expression is simplified by using tables over the Binomial distribution, e.g. Table 1 p839 
in Wackerly et al 2007. 


For a two-tailed test we also have to compute the probability of extremely large values. Since the expected number 
of accidents is n- p =15-3/5=9 , we compute P(Y(3) > 13)= 


15 13 2 15 14 1 15 15 0 
3 |v (2/5) “{iloro (2/5) {islevs (2/5) =0.0271. 


The p-value for a two-tailed test is 0.0338+0.0271 > 0.05 and the conclusion is that the effect of renewed equipment 
is not significant. 


EX 119 HH, : p; =1/4.The sum of observations is 80, so n/k = 80/4 = 20. The test statistic is 


_ (19=20)° _e- 20) , 20- 20) _ @5- 20)” 
20 20 20 20 


Hy is not rejected, the observations may be generated by the same Poisson variable. 


x? = 2.10 => p- value= P(y2(4—1) > 2.10)> 0.10 
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EX 120 The likelihood is 7 = ge 7" 5 j,, =” =} (ch. x41) 
ay y 
n -A, yy; 
: L A e = —Ayny+n = n(1-AyV) ‘ ‘ ‘ 
The LRis A =—@ = “9 =(Ayye wr" = (Aye. Hy is rejected for small values of A. Which values 
ZL (/pyte" 
of AV will make A small? To answer this consider the function g(x) = ere) 


In(g(x)) =nI|n(x)+n(1—.~x) which has max/min for the same values as g(x) (cf. Ch. 2.3.3). 


din(g(x)) _1 0x <i, d* In(z(x)) salu 0 => Local max for x =A, y =1.Thus H,, rejected for extremely 


dx x dx” Bs 


large or small values of A, y , or equivalently for large values of Ao i , but how large or small? 
To this end we use property (5) in Ch. 2.2.2. 
Y, ~ Exponential(A) > >, Y, ~ Gamma(n, A) > 24> Y, ~ y’ (2n) - By using the latter relation probabilities can 


i=l i=l 
easily be computed from the Chi-square distribution. The RR is thus of the form 2Ay >, Y, <C, or 2A, >, Y,>C,, 


where C, and C, are constants that are determined in the following way: 
The test is two-sided at the 5% level and n = 10 which yields 


P(y? (2-10) < C, = 0.025 => C, =9.5908 and P(y?(2-10)>C, =0.025) => C, = 34.1696. 


The RRis yy, . 9.5908 or DY, . 34.1696 

2A 2Ay 
Notice that the sample has not yet been collected and nor has the value of 7, been specified. Once this has been 
the case the test can be performed. 
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EX 121 The Likelihood without restrictions on the parameters is 


ny ny 
— I] Aye . [][4e*” = Ae Ai eM 9 yD and the Likelihood under /, is 
i=l i=l 


Lp = glaxtny) ALL, 


InL=ny Indy +ny Indy —Ay Dx, -Ay > y, = Uk te yy =0>4,=—. 
* | 


A 


Similarly, Ay = 3 


Ny 
Ji 


dl ] 
InLy) =(ny +ny)Ina- ALD x, +> y)> — = oes be aa ree 


i Jlrxtny) e ixtny) Jirxtny) 
Ne L _ F e) ny , ex ety = Pi war ny = 
X Y X Y 


—=2.00, A, a ey 2 


= =167=> 
20 40 20+ 40 


~2In A =-2(-0.9712) =1.9425 = p - value = P(y? (1) > 1.9425)=0.16 > 0.05, 
There is no reason to reject H, 
The link between the Poisson process and exponentially distributed intervals was stated in property (4) 


Ch. 2.2.2. In EX 86 it was shown how to test the equality between two Poisson rates based on count data (frequency 
of occurrences). In the present example we have shown how to perform the same test based on interval data. 
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EX 122 


a) The unrestricted Likelihood is L = I] (l— p)** p=(1- pyr a 


i=l 


InL=( y-hlat< pete 22 | Teall )+—=0> p= —= 
P >; 


1 
dp (1— p) y 
The Likelihood under H_, is Ly = (1/ 22 (1/2)” =(1/ 2)" . (No parameters need to be estimated.) 


a 1/2)” (G12) 


A= = =[After some simplification, but not needed.] = 


: t ae (+) (p=) 
p p 


—2InA =-2|>* y, -InGi/2)-(S- y, —n)In(F Df which is distributed y7(1—0) under Hy. 


b) —2InA =~2{801n(0.8) — (80 — 50) In(8/5—1)}=5.0534, p-value = P(y?(1) > 5.0534)=0.02 
<0.05. 


His rejected and since p = 50/80 =0.625 > 0.5 we draw the conclusion that p is significantly larger than 1/2. 


c) There are many situations where we can collect data on the variable Y, = ‘Number of trials until an (0, 1) -event 
occurs for the first time’ in order to test the hypothesis that p =1/2. An example is a sequence of ups and downs 
on the stock market. 


EX 123 


a) From the data we get 7=16, y, = 2.2375, x = 7.3265 . Thus, 
_, 223750 


V7.3265/16 


that the weight loss program had a significant positive effect. 
b) Define T=‘Number of positive signs. Under H, : P(+)= P(-) =1/2 the variable T is distributed 
Binomial(n =16, p =1/2). In the data we observe T = 13 and a one-sided p-value is 


SiG ; oo i6{(16) (16) (16) (16)| 
rtras)= 3" }ara (1/2)!* =(1/2) {(is}+(ie}*is}*(ie}f 220 


However, to compute a two-sided p-value we should also consider the possibility that the outcome may be in the 
‘opposite direction’ Since the expected value of Tis 16/2 = 8, the ‘opposite direction’ consists of the outcomes T < 3. 


3,(16 16 1 16 16 
vt <3)=¥-('°|arayraay =«/3)"|( lel | {F}ef ;: }}-o010. (This result is to be expected 


y=0 


= 3.307 = p- value = 2 - P(T(16 — 1) > 3.307) = 2 - 0.0024 = 0.004 . The conclusion is 


since the Binomial distribution is symmetric for p = 1/2.) 


The two-sided p-value is thus 0.02, which is much larger than 0.004 in a), but still less than 0.05. 
SS 


v16/4 


©) T ~ Binomial(n = 16, p =1/2) => P(T >13)=1- P(T <13)%1 fz 
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EX 123 (Continued) 


=1- P(Z <2.25)= P(Z > 2.25)=0.0122. 
3-16/2+0.5 


v16/4 


The two-sided p-value is 2-0.0122 =0.02 . 


Similarly P(T < 3) oz < ) = P(Z < 2.25) = 0.0122 because of symmetry. 


In the past four years we have drilled 


31,000 km 


That's more than twice around the world. 


Who are we? 

We are the world’s leading oilfield services company. Working 
globally—often in remote and challenging locations—we invent, 
design, engineer, manufacture, apply, and maintain technology 

to help customers find and produce oil and gas safely. 


Who are we looking for? 
We offer countless opportunities in the following domains: 
= Engineering, Research, and Operations 


= Commercial and Business 


lf you are a self-motivated graduate looking for a dynamic career, 


apply to join our team. What will you be? 


careers.slb.com Schlumberger 


* 
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EX 124 


a) Introduce the notations P., and p,, for the proportion of products that are classified ‘Bad’ by State authorities 
and Municipal authorities, respectively. The observed difference is Ps — Pu = 


(20+20) (20+10) 


= 0.10, but is the difference significant? 


100 100 
Ay: Ps =Pm 
2 
McNemar's test (Cf. EX 71) yields: X¥* = ave =3.33> p - value= Ply? (1) > 3.33)= 0.068. 
+ 


We can't reject the null hypothesis at the 5% level. 
b) HH, :Independency between the two types of classifications. (It’s close to an insult to set up this hypothesis.) 


The ordinary Chi-square test of independency yields: 


_ (20-12)? | (0-18) (20-28)? (50-42) | 
iz wt 2 
There is strong reason to reject H,. 


pa 12.7 => p- value= P(y? (1) > 12.7) << 0.005 


wo a ; ; : s 
EX 125 The pf. is P(v) =e * i=! ee ae 2 _ 1.3225. (Thisis simply A = ) 


yo 103+143+...42 


Now, p(0) = e * =0.2667, p()= Ree 20,3527 , and so on. In this way we get the following table, where Y = 


Observed frequency, 7p(y) = Expected frequency and n = 400. 


y 0 1 2 3 4 5 6 Total 
~ .2667 3527 .2332 .1028 .0337 .0090 .0019 1 
P(Y) 
nm 106.7 141.1 93.3 41.1 13.5 3.6 0.7 400 
np(y) 
Y 103 143 98 42 8 4 2 400 
A 2 0.127 0.026 0.237 0.020 2.241 0.044 2.414 5.109 
(Y —np(y)) 
np(y) 


p-value = Ply? (7-1-l)> 5.109)= 0.40 , so there is no reason to reject the Poisson distribution. The degrees of 
freedom in the Chi-square distribution is due to the fact that there are 7 cells and 1 parameter has been estimated. 
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EX 126 A 2 x 3 frequency table is 


Morning Day Night Total 


Defective 12 10 23 45 
Not defective 188 190 177 555 
Total 200 200 200 600 


The table with Deviation / Cell Chi-Square (Cf. Ch. 6.2.1) is 


Morning Day Night 
Defective -3/0.6 -5 / 1.6667 8 / 4.2667 
Not defective 3 /0.0486 5 /0.1351 -8 / 0.3459 


The total sum of Cell Chi-Square is 


X* =0.6+...+ 0.3459 = 7.0631= p - value= Ply? (3 -DQ—-1))> 7.063 1)= 0.0293. We reject the hypothesis 
of no association. 


In the last table there is a large excess of observations (+8) in the cell Defective x Night. The p-value for this Cell Chi- 
Square is Ply? (1) > 4.2667)~ 0.0389 < 0.05. However, there are deviations in 6 cells to take account of. Since 0.0389 
> 0.05/6 (Cf. Ch, 6.4.) we can’t claim that there is a significant over-representation of observations in the cell Defective 
x Night after having adjusted p-values for multiple comparisons. 


EX 127 There are three differences between proportions to consider. The largest difference is for the proportion 
defective in Night and Day, 23/200 — 10/200 = 0.065. The corresponding test statistic for testing that the true 
difference is zero is (Cf. EX 83 a).) 


oe 0.065 _ Pp _ 23+10 
4{P(— p)A/ 200 + 1/200) 200 + 200 


| = 2,36 = p- value= 2P(Z > 2.36)=0.018 <0.05. 


However, since there are three comparisons to make we should require that 0.018 is less than 0.05/3 ~ 0.017 
(Cf. Ch. 6.4.). The difference between defectives in Night and Day is approximately significant after adjustment for 
multiple comparisons. The other two differences are not. 


EX 128 Let p, and p,, be the proportion left-handed among identical- and fraternal twins, respectively. We want to 


test H, : p,; = Pp, (= p).The test statistic for this is 


a ae 7 41. 18 
= cide a which is distributed (0,1) in large samples. Py =~» Pp ==> 
pd - py(l/n, +1/n,) 248 246 
._ 41418 


248 + 246 
between the proportions. 


=>T =3.153> p- value= 2PZ > 3.153) = 0.002 . There is thus a strongly significant difference 
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EX 129 We give two types of solutions, one ‘Straight’ as in EX 85 a) and one in a ‘Bio-statistical style’ 


_ z 114 
Straight solution: The two estimated proportions to be compared are = —— an = ——_—__, 
oe eee i ie aaa P\= 350000”? 473000 
p= T= PT P2 takes the value -1.733 
20000 + 473000 i 5 1 1 
+ 
PP '50000 473000 


=> p- value= P(Z > 1.733) = 0.042 < 0.05. (In this case it seems reasonable to use a one-sided test.) 


Hint: If you have problems to perform calculations with very small numbers in both numerator and denominator in T, 


just multiply T by e.g. 1000 F 
000 
Solution in a Bio-statistical style: Let Y ='Number of polio cases among 20000 vaccinated children’. 
: : 114 
Under the hypothesis that the SALK vaccine has no effect Y ~ Binomial(n = 20000, p = Feri 


and approximately Y ~ Poisson(A = np = 4.8) . The latter variable has expected value 4.8 and therefore the 

hypothesis of no effect is rejected for small values of Y. 

1 " 
4.8? 

p-value = P(Y < 1) = > 

y=0 

In this case the group under study (vaccinated children) has been exposed to a standard value of a parameter 

(p = 114/473000) and the outcome of this is evaluated. Such an approach is very common in bio-statistical studies, 

especially in epidemiology. 


e “8 =(144.8)e 48 = 0.046 < 0.05. 


EX 130 Table of Deviation / Cell Chi-Square: 


Low Middle High 


Cheap -14.3/ 3.93 20.0/ 5.86 -5.6/ 0.9 


Expensive 14.3/ 4.31 -20/ 6.44 5.6/ 1.0 


The total Chi-square measure is X7 =3.93+...+1.0=22.4=> p - value = 0.00001 << 0.05. There is thus 
strong evidence against independency. 


The largest Cell Chi-Square is 6.44 and p-value =P(y? (1) > 6.44)= 0.011< 0.05. For a multiple comparison in 6 cells 
it is required that the latter is less than 0.05/6 = 0.008, which is nearly true. No further significant patterns can be 
seen. 


The conclusion is that there is a significant under-representation of Middle- class families with Expensive electronic 
equipment. 
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Exercises in Statistical Inference 


with detailed solutions Answers to Supplementary Exercises 


EX 131 Let Y =‘Yearly number of cases in Malmé’~ Binomial(n =110000, p).We want to test 
Hy : p=35/100000. Since p is small, Y ~ Poisson(A = 110000 - 35/100000 = 38.5) under H. 


The expected value of Y is 38.5 and the observed value is 60. It is therefore natural to compute the p-value as 


co y 
P(Y = 60)= Se , but this is a heavy task. Instead we use the fact that for large Aa Poisson variable can be 
! 
y=60 V+ 
approximated by a N(A, A) -variable (Cf. Ch. 2.2.2,). 


60 — 38.5 
> —_—_—_ 


V38.5 


The conclusion is that inhabitants in Malmo have a significant higher risk for malignant melanoma than the rest of 
the Swedish population. 


p-value — PY > 60)~ rz = P(Z > 3.47) = 0.0002 


\ 
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EX 132 


a) As the question is formulated, a one-sided test seems to be appropriate. On the other hand, if the problem 
was to find out which of the two importers is best, a two-sided test is preferably. 


b) The two estimated proportions are Pi ae 0.095 and p, me 0.050, Totest Hy: p, = px(= P) 
200 200 
eee aie, . 
Pres wie pee 
VJ PU- p/n, +1/ ny) 200 + 200 


This gives T = 1.735 => p - value= P(Z > 1.735)= 0.0413 < 0.05. The new importer is better! 


we use the test statistic 7 — 


c) ¢) We construct the following 2 x 2 table: 


Quality 
Bad Acceptable Total 
Importer Former 19 181 200 
New 10 190 200 
Total 29 371 400 


X? =1,3966 +0.1092 +1.3966 + 0.1092 =3.0114 = p- value= P(y?(1) > 3.0114)= 0.0826 - 


Notice that the p-value in c) is twice that of b). The Chi-square test is by its nature a two-sided test. 


It has been demonstrated that the test of equality between Binomial proportions in EX 83 a) is equivalent to the Chi- 
square test of independence if the test is two-sided. The Chi-square test can also be used as a one-sided test if the 
p-value is divided by 2. However, both tests are approximate and require that sample sizes are large. 
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EX 133 We first consider one-sided vs. two-sided tests. 


Most people would probably agree that vaccination could not have a negative effect on the state of health. The test 
should thus be one-sided. 


(However, there may be other opinions on this issue. Some might e.g. argue that vaccine can be contaminated.) 
The test for equality of two Binomial proportions yields 


Ps ee = ~4.0476 = p - value= P(Z > 4.0476) = 0.000025 . 


22 I 22 \f 1 ‘ 1 
156 156/)\90 66 


From the 2 x 2 table the following table over Deviation / Cell Chi-Square is constructed: 


Diseased Not diseased 
Vaccinated -8.7 / 5.9529 8.7 /0.98 
Not vaccinated 8.7 / 8.1176 -8.7 / 1.33 


From this, ¥? =16.38—= p - value= P(y’ (1) > 16.38) = 0.000050. The latter is the result of a two-sided test. To 
get a one-sided p-value we simply divide it by 2. 


In the last table there are two significant Cell Chi-Square. Ply? (1)>8.1 176)= 0.0044 < = =.00125 and 
4 


P(z2(1) > 5.9529)= 0.0147 < > = 0.0166 (Cf. Ch. 6.4). 


EX 134 


a) X* =6.6197> p- value= Ply? (1)> 6.6197)= 0.0178 (two-sided test). For a one-sided test the 
p-value is halved, 0.0139. 


ee 1 1 


b) p-value = + 
145! 15!1!83!46! —16!-0!-82!-47! 


) = 0.0123 (one-sided test) 


EX 135 


a) We first test whether the population variances of the two bulbs are equal (Cf. EX 88.). 


2 
783) 


=1.79= p-value=2- P(F(19 19)> 1.79) = (0.22, The variances can be considered equal. 

(62) 

_ (20-1)(62)? + (20 - 1)(83)° 
(20 -1) + (20-1) 


The test statistic for testing that the two population means are equal is 


The pooled sample variance is 7 = 5367. 


_ 1128-1236 
4(5367(1/ 20 + 1/20) 


clearly significant. 


=-4.66 => p-value = 2- P(T(20—1+ 20-1) > 4.66)= 0.00004 . The difference is 


b) Referring to EX 88, we get 7 = eee = 4,66 => p- value= 2- P(Z > 4.66)< 0.00001. 


(62)? /20 + (83)? /20 
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Exercises in Statistical Inference 
with detailed solutions Answers to Supplementary Exercises 


EX 136 Let Y, and Y, be the yield from variety A and B, respectively. Form the difference D = Y, — Y, and 


compute the estimates d =1.33 and so = 6.7882 . The hypothesis that the difference between the means is zero 


is the same as that E(D) = 0. This is tested by T = ewe =1.77 


6.7882 /12 


=> p-value=2- P(T(12 -1)> 177) = 0.10. We can’t claim that the difference is significant. 


EX 137 The ranked series are: 


A 22 | 28 | 38 | 40 | 40 | 41 44 | 46 | 48 | 51 58 60 
B 12 |15 | 18 | 21 | 23 | 24 | 29 | 35 | 36 | 43 | 54 80 
38+ 40 


The combined ranked series is easily shown to have the sample median m = 
frequency table 


= 39 . This leads to the 2 x2 


A B Total 
>39 9 3 12 
<39 3 9 12 
Total 12 12 24 


The hypothesis of interest is H , :No association between Type of series and Distribution around sample median. 


X? =6.0> p- value= Ply? (1) > 6.0)= 0.0143 . The two series have significantly different medians. 


www.alcatel-lucent.com/careers 


le, = 


i 
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future? 
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In the near future, people may soon think it's strange that 
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EX 138 


a) Remember the conditions for the Binomial distribution in Ch. 2.2.1,’ independent repetitions of the same 
experiment’ The taster has thus to spit out the beer after each tasting. He also has to reset his memory, so that 
he can’t compare the taste of a new glass with the taste of the preceding one. Therefore, a so called wash-out 
period is needed between tastings. The arrangement of the glasses is easily done by tossing a coin. 

The hypothesis H, : p =1/2 means that we assume that the taster is merely guessing. 


8 
b) From the Binomial distribution we get P(Y = 8)= ) = = 0.004 < 0.05, 
22 2 
8 
P(Y >7)=P(Y =7)+ PY =8)= 8 oi - + =i = =i (8 + 1)= 0.035 < 0.05. For y smaller than 7 we get 


probabilities that are larger than 0.05. The smallest acceptable value of y is thus 7. 


=1.645> 


c) We have to solve x in the relation rly oa)nal2 io — 100 - (1/2) Joos x—50 


4{100(1/ 2)(1/2) 
x = 59 will be enough. 


EX 139 


a) From EX 4 we get 


y 0 1 a 3 4 
Sample cdf 6/160 44/160 102/160 149/160 160/160 
60) 0.056 0.235 0.374 0.265 0.070 
0.056 0.291 0.665 0.930 1.000 
Foy) 


Here, Fy (vy) = p(0) +...+ p(y). 


The largest absolute difference between the sample cdf and Fy(¥) is Diego = I 02/160 — 0.665] = 0.028 . 


The critical value for a two-sided test at the 5 % level is 1.36/4/160 =0.11 (Cf. Ch. 6.2.4) > 0.028 so there is no 
reason to reject the Binomial distribution. 


b) 

y ) 1 2 3 4 
Expected frequency 9.0 37.6 59.8 42.4 11.2 
Observed frequency 6 38 58 47 11 


Here, Expected frequency is 160- p(y) y =0,1,2,3,4. 


2 2 2 2 2 
42 £6 = 9.0)", (G8=37.6)" | (58-598)? (47- 42.4)” | (1-112) 
9.0 37.6 59.8 42.4 11.2 


=1.56> 


p-value = Ply? (5-1-l1)> 1.56)= 0.67 and there is again no reason to reject the Binomial distribution. 
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EX 140 
a) The ranked series and sample cdf's are: 


Day1 (15 observations): 


y 30.8 31.0 31.4 32.9 33.3 33.4 33.5 


Sis(y) 0.067 0.133 0.200 0.267 0.400 0.467 0.533 


y 33.7 34.3 34.4 34.8 34.9 36.2 37.0 


Sis(v) 0.600 0.667 0.733 0.800 0.867 0.933 1.000 


Here, 0.067 = 1/15, 0.133 = 2/15, and so on. Notice that there are two observations at y = 33.3. 


Day2 (10 observations): 


y 28.4 28.7 30.2 30.7 31.3 31.9 32.1 32.4 32.8 


Sio() 0.100 0.200 0.400 0.500 0.600 0.700 | 0.800 0.900 1.000 


For 31.4< y<32.9, S,,(y) =0.20 and for y = 32.8, Si) (7) = 1.000. The largest absolute difference between 
the two cdf’s is D5 1) =|0.20 - 1.00] = 0.80. 


From tables over the Smirnov two-sample distribution it is seen that the critical values for rejecting the hypothesis of 
equal cdf's are 15/30 (5% level) and 19/30 (1% level). The observed absolute difference of 0.80 is larger than both of 
these, so the hypothesis of equal population distributions can be rejected at least at the 1% level. 


b) From the data we get Day 1: Mean = 33.66, Variance = 3.05 
Day2: Mean =30.87, Variance = 2.28 


As in EX 86 we first test whether the population variances of the two series are equal. 


3.05 
F= PET =1.34> p- value= P(F(15 —1,10—-1)> 1.34)= 0.34 => Population variances can be assumed to 


be equal. The pooled estimate of the common variance is 57” = baw ES hes ial =2.75. 


(15-1)+ (10-1) 


To test whether the population means are equal, 
- , 3366 =30,87 
2.75(1/15 + 1/10) 


means differ significantly. 


=4.18=> p - value = 2- P(T(15 —1+ 10-1) > 4.18)= 0.0004 => Population 


According to both tests in a) and b) the population distribution of viscosity has changed from one day to another. 
The test in b) gave a somewhat stronger rejection of equal means. On the other hand, the test in a) is free from 
assumptions about the population distribution. The latter is to prefer in the absence of evidence for that viscosity is 
normally distributed. 
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Exercises in Statistical Inference 
with detailed solutions Answers to Supplementary Exercises 


+if yield for A is higher than B 
EX 141 Introduce the notations / _ if yield for Bis higher than A, We now get the following pattern: 


Oif yield for A equals yield for B 


Area 1 2 3 4 5 6 7 8 9 10 11 12 


Sign + - + - + - + + + - + 0 


Let Y ='Number of minus signs’ of n= 11. (The observation with a 0 is deleted.) Under the hypothesis of no difference 
in yield, Y ~ Binomial(n =11, p =1/2). 


eluee wee fie aya fy at 
p-value rer sa)=¥(l)ay'av2y » =(1/2) (Queue neues 


aoe 41145541654 330)= 0.27 , A two-sided p-value is obtained by doubling the latter value. In neither case 
2048 


the difference is not significant. 
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EX 142 We want to test the hypothesis of no association between the two series. Let d, be the difference between 
the ranks for the i:th student. 


D0; =14+16 44494249424 25416414414+9+164+16+9=146 


ry =l : -146 = 0.7393. 


151s" =1) 
From Table 11, Appendix 3 in Wackerly et al one gets the critical value 0.525 
(two-sided test, @ = 0.05 ). Since this is smaller than 0.7393 we reject the hypothesis of no association. 


EX 143 Let X, = Value before, Y, = Value after for the i:th sample unit and put D, = X, —Y,. 


From the data we get n = 10, D = 0.03, Se = 0.001533 (S,, = 0.0392). 
0.03 —0 


~ 0,0392/ 10 


the concentration due to the effect of the drug. 
There are good reasons for assuming normality in this case. Notice that JD involves a sum of 20 variables. 


=2.42=> p-value=2- P(Z > 2.42)= 0.0156. There has been a significant decrease of 


b) RRis |D - 0| > 1.96 -0.0392/ Vn 


Ql Pontig) =H 2>196-1~ EEO 1.96+1 Yann) 


0.03 0.0392 


Forn =100 this will be larger than 0.90 if |p| > 0.013. 


EX 144 From EX 97-98 we obtain 


Vp-p) J p(l- p) {p- p) Vp(l- p) 


By inserting p = 0.15 one gets the requirement Pow(0.15) = 0.90, an equation in n that has to be solved by ‘trial 
and error’ It is found that for n = 200, Pow(0.15) =0.90031. 


Powe) = 2>1.96, 28 e202 + 96 _0.08(1 = 0.08) aaa 


0.08(1 — 0.08) 


a 


Now the study can begin and the sample is collected. In practice it would be wise to include somewhat more than 
200 persons in the sample due to non-responses or drop-outs. 


RR is = [n= 200]= 0.0376 . 


p - 0.08] >1.96- 
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EX 145 T =(n—-1)S° ~o* y*(n—1).The RRisT <aorT >banda =0.05. 


a) ofr diy) Pleiz*n-d<a)=A{ 2% I< t |+Sazhe ancizh, 
0 


 _ P(r>dH,)=Plo2z?(n-1) >b)=P| 2 (n-I) > |=1-F y2(n-Y<—-|5 
2 a, ea 


b a b 
ae I)< -}- > += Mair» P= Ooi a!2 
2 8% 


Here we have used the notation P(y? < Lp )= p. 
. 2 vey) 
The RR is thus 7’ < 09 %7/. OFT > 09 Hig /2- 


Pow(o*) =p(r< on x2)>)+ P(r> Or xr air)=Plo2 77 (n -l)< on y2)>)+ 


2 2 
Plo? 7? (n=) > O23 7242 )=P (n-N< 72, |+P Ct > 2h za} whieh mea 
oO oO 


to be a function of R= a, ome 
b) Withn=10and a@/2=0.025, Yoo75 = 2.70039 and 74975 =19.0229. The RR is 


T <0, «2.70039 or T > 4 -19.0229. 


Pow(R)= Plz? (9)<R- 2.70039)+ Ply? (9)>R- 19.0229). This is a non-symmetric function. Some values 


are: 


R 0.2 0.5 1.0 1.5 2.0 5.0 


Pow(R) 0.92 0.39 0.05 0.09 0.20 0.94 


EX 146 With n= 10 the RRis S“Y, < — or YY, > 34.1696 


Ay 


Powtiy=A - 25 vy> 34.1696 


) = [Multiply each factor within braces with 2/ and use the 
0 


result that 22° Y, ~ y?(2n)]=P| Zen < 4.93508) +9f 2Qn> 224.1096). 


i=l 0 0 


A 
Put R = — and plot the power as a function of R.It is seen that the power is larger than 0.90 for 
0 


R<0.35andR>3.0. 
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EX 147 From the data we get 77 26, x= oy =91,>°y =21, 9° =89.2, xy =90, 


S yy =91- (21)? /6=17.5, Syy =89.2 — (21)? /6=15.7, S yy = 90—(21)(21)/6=16.5. 


~ 16. ~ Ot 33 A . E 
) p=182 0.9429, a@=—- fB—=0.2, SSE =89.2— B* -91=8.3029, 6? OF 5759 
17.5 6 6 =o 
Hy: fB=1 
0.9429 -1 


= =—0.33, p- value= 2- P(T(6 — 2) > 0.33)=0.66 . Don't reject Hy. 
{2.0757 (6 — 2)17.5 


H,:a=0 


ve 
T= eee =0.237, p-value=2- P(T(6—- 2) >0.237)=0.82. 


{ees : ae 
6 (6—2)17.5 


Don't reject Hy. 


b) Since H, :@ =O can't be rejected we apply the model E(¥|x)= Bx. 


» = 0.0378 (The SSE in 


From EX 54, 8 -= = 0.9890, SSE =89.2— (90/91)? -91=0.1890, G* = 


n- 
the latter expression is different from the one in a).) 


H,:f=1 


| ee p- value= 2- P(T(6-1) > 0.54) =0.62. Don't reject Hy. 


V0.0378/91 


The conclusion is that the model in b) is to prefer. 
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EX 148 First we construct the following table: 


x x 2 n nx nx y ny nxy 

1 1 4 4 4 1.00 4.0 4.0 

3 9 5 15 45 3.64 18.2 54.6 
5 25 3 15 75 7.23 21.7 108.5 
10 100 4 40 400 12.725 50.9 509.0 
15 225 4 60 900 18.225 72.9 1093.5 
Total 20 134 1424 167.7 1769.6 


~ 1769.6 — (167.7)(134) / 20 
1424 - (134)? 


=1.2277, a SE ea 
20 20 


" |) ¥ | ng-a-B- x D5 Qo vy)? 0 
1 4 1.00 0.584 5.1-(4.0)?/4=1.1 
3 5 3.64 One? 74.58 — (18.2)” /5=8.332 
5 3 7.23 2.653 158.97 — (21.7)? /3 = 2.0067 
10 4 12.725 0.345 648.61 —(50.9)* /4 = 0.9075 
15 4 | 18.225 0.476 1332.95 — (72.9)? /4 = 4.3475 
Total 4.253 15.6612 


Hy: E(Y|X =x)=a+ B-x 


4.253/(5-2 
= ( _ 1.358 => p - value= P(F(1 5)> 1.358)= 0.29 . No reason to reject the hypothesis. 
15.6612 /(20 — 5) 


EX 149 
x y n nx nx? y ny nxy 
0.5 1.315 2 1 0.5 1.4 2.8 1.4 
1 1.92.1 2 2 2 2.0 4.0 4.0 
4 3.94.1 2 8 32 4.0 8.0 32.0 
16 7.8 8.2 2 32 512 8.0 16.0 256.0 
Total 30.8 8 43 546.5 30.8 293.4 
» 293.4 —(30.8)(43)/ ~, 30. » 43 
p22 a Ly ie ene a 
546.5 — (43)2/8 8 8 
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EX 149 (Continued) 


ae J ¥ | ny-a-B-x) | Diy FN) 

0.5 2 1.31.5 14 0.4469 0.01+0.01 = 0.02 

1 2 1.9 2.1 2.0 0.0114 0.01+0.01 = 0.02 

4 2 3.9 4.1 4.0 1.0037 0.01+0.01 = 0.02 

16 2 7.8 8.2 8.0 0.0488 0.04+0.04 = 0.08 
Total 1.5108 0.14 


Fe 1.5108 /(4 — 2) 7 


21.6=> p- value= P(F(2,4) > 21.6) = 0.007 . The linear model has to be rejected. 
0.14/(8 — 4) 
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EX 150 The model _y =a - x’ may be a candidate. Here, In(y) = In(a) + bIn(x) or y'=a'tb-x'. 


We now test whether E(y' [x")= a'+b-x' is a proper model. 


xX y n nx’ n(x’ ) 2 yz ' nx' y' ' ny" 

-.069315 0.2624 2 -1.3863 0.9609 0.3339 -0.4629 0.6678 
0.4046 

0 0.6419 2 0 0 0.6919 0 1.3838 
0.7419 


1.38629 1.3610 2 2.7726 3.8436 1.3860 3.8428 2.7720 


2.77259 2.0541 2 5.5452 15.3745 2.0791 11.5291 4.1583 


3.4657 8.9818 8 6.9315 20.1790 4.4909 14.9090 8.9818 


8.9818 =, 6.9315 


ie 14.9090 — (8.9818)(6.9315)/8 ~0,5028, a'= ; A : = (.687 


20.1790 — (6.9315)? /8 


From this we get a new table: 


n cat doe sy gD 
n(y'-a'—b'-x')? O07 =P 3) 
2 .000044 0.0102 
2 .000046 0.0050 
2 .000007 0.0013 
2 .000008 0.0013 
Total 0.000105 0.0078 


_ 0.000105 (4 — 2) 
0.0078 (8 — 4) 


new model. 


= 0.027 = p - value= P(F(2,4) > 0.027) = 0.97 . There is definitely no reason to reject the 


EX 151 The hypothesis to testis 1): 8, =0, 8, =Oagainst 1, : 2, #0, 8, #0, i.e. there is no effect of the 
sex on body weight beyond the initial weight. 


_ (18.7454 - 18.4381) /(3-1) 


= 0.50 => p - value= P(F (2,60) > 0.50) = 0.61. No reason to reject H, 
18.4381/(64 — 3-1) 
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Answers to Supplementary Exercises 


CEG eps’. 


From the print out we obtain: 


R? =0.990 (a good agreement). 


EX 152 The model is linearized by the transformation In(Q) = In(@) + Z, InW/) + f, In(P)= 


Parameter Estimate T-value P(r % | aor lue\) 
a' 9.0795 1.43 0.1765 
-0.9994 -1.92 0.0766 
B 
1.0779 16.23 < 0.0001 
B, 


the model. 


Here the 7-value and the corresponding p-value are computed under the hypothesis that the parameter is zero. 
Since @ =e”°”° =8773 the estimated model is O =8773- I7' Pp!’ A‘significance-fundamentalist’ (a person 
who argues that all non-significant parameters should be deleted in a model) would object against including / in 
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Comment to the solution of EX 152. To solve non-linear problems by linearization and using least-squares 
techniques, as in the present example, is very frequent (perhaps too frequent) among econometricians. 
One should be aware of that if f is unbiased for f, it does not follow that P/ is unbiased for P’ . Let's 


look at this a bit closer. 


Put g(f) = P? , where E(B) = f.. According to (14) in Ch. 3.3.2, Elg(A))= g(f)+ 58") -V(B). 


d\ ; 
Now we have to find g''(f). For simplicity, put g =P” > In(g) = yIn(P) > ae = In(P) => 
ly & 


g'=In(P)- g => g''=In(P)- g'=(In(P))’ - g =(In(P))’ P”. From this we finally obtain 


e s 2 a 
a(p’)= PF + (incry)? PP -V(B)= a + wayne > P*, so there will be a positive bias. However, 
const. 


since V() = , the bias can be ignored in large samples. 


EX 153 S(y)=e ” = In(S(v))=—Ay? = y'=In(- In(S(y)))= In(A) + a In(y) = A'ta- x, say. 


We thus run a regression of p'= In(- In(S(y)})on X = In(y). This yields: 


Parameter Estimate Standard Error of Estimate 
A -0.0914 0.0378 
a 2.0470 0.0888 
Here, Standard Error of Estimate = 4/ V (Estimator) 
2.0470 -1 


H,:a=lagainst H, :a@ #1is tested by T = =11.79 > p- value= 2- P(T(5— 2) >11.79)= 0.001. 


Reject 7! 


0.0888 


Download free eBooks at bookboon.com 


References 


Casella, G. & Berger, R.L. 1990, Statistical Inference, Duxbury Press, Belmont, California. 


Cochran, W.G. 1934; The distribution of quadratic forms in a normal system, with applications to the 


analysis of covariance, Proc. Camb. Phil. Soc, 30, pp. 178-191. 

Cochran, W.G. 1950, “The Comparison of Percentages in Matched Samples; Biometrika, 37, pp. 256-266. 
Cochran, W.G. 1954, ‘Some methods for strengthening the common 7’ test’, Biometrics, 10, pp. 417-451. 
Cox, D.R. & Smith, W.L. 1954, ‘On the superposition of renewal processes, Biometrika, 41, pp. 91-99. 
Cramer, H. 1957, Mathematical Methods of Statistics, 7 edn, Princeton University Press, Princeton. 


Davis, C.E. 1976, “The effect of regression to the mean in epidemiologic and clinical studies; 
Am. J. Epidemiol., 104, pp. 493-498. 


Diggle, PJ., Liang, K-Y, & Zeeger, S.L. 1994, Analysis of Longitudinal Data, Oxford University Press, 
New York. 


Fisz, M. 1963, Probability Theory and Mathematical Statistics, 3" edn, Wiley, New York. 


Fukuda, M., Fukuda, K., Shimizu, T., Andersen, C.Y. & Byskov, A.G. 2002, ‘Parental periconceptional 
smoking and male:female ratio of new born infants, The Lancet, 359, pp. 1407-1408. 


Holm, S. 1979, ‘A Simple Sequentially Rejective Multiple Test Procedure’ Scandinavian J. of Statistics, 6, 
No 2, pp. 65-70. 


Hsiao, C. 2002, Analysis of Panel Data, Cambridge University Press, Cambridge. 


McNemar, Q. 1947, ‘Note on the sampling error of the difference between correlated proportions or 


percentages, Psychometrika, 12(2), pp. 153-157. 


Petzold, M. & Jonsson, R. 2003, ‘Maximum Likelihood Ratio-based small sample tests for random 


coefficients in linear regression, Working Paper in Economics, No 110, 2003. 


Rao, C.R. 1965, Linear Statistical Inference and Its Applications, Wiley, New York. 


Download free eBooks at bookboon.com 


Exercises in Statistical Inference 
with detailed solutions References 


Scheaffer, R.L., Mendenhall, W. Ott, R.L. & Gerow, K. 2012, Survey Sampling, 7th edn, Brooks/Cale 
CENGAGE Learning. 


Shuster, J.J. 1992, “Exact unconditional tables for significance in the 2 x 2 multinomial trial’ Statistics in 
Medicine, 11, pp. 913-922. 


Stuart, A., Ord, J.K. & Arnold, S. 1999, Kendall's Advanced Theory of Statistics, Vol 2A, Arnold, London. 


Wackerly, D., Mendenhall, W. & Scheaffer, R.L. 2007, Mathematical Statistics with Applications, 7" edn, 


Thomson, Toronto. 


Wonnacott, T. 1987, ‘Confidence intervals or hypothesis tests?, J. of Applied Statistics, 14, No 3, 
pp. 195-201. 


Yates, F. 1934, ‘Contingency table involving small numbers and the y’ test’, Supplement to the J. of the 
Royal Statistical Society, 1(2), 217-235. 


ce, expertise, and creativity, 
performance beyond expectations. 
é need the best employees who can 


We Power of Knowledge Engineering 


— 
. 
‘\ 


Plug into The Power of Knowledgeengineering. 
Visit us at www.skf.com/knowledge 


198 Click on the ad to read more 


Download free eBooks at bookboon.com 


